This week on twitter 2009-09-27
- now that I managed to insert 7000 pages in #wordpress I have another problem. Page admin super slow
# - @mihaibrehar yeah I noticed it actually fetches ALL pages. Why would it do that if it has pagination ? in reply to mihaibrehar #
- @filipcte yeah, food is for wimps
in reply to filipcte # - @Yesmail doesn't this go against every permission based marketing rule out there? some might even consider this as spam ... in reply to Yesmail #
- I had no idea @ginatrapani has a wife, but it seems so from this article: http://tinyurl.com/l5g7dy
in reply to ginatrapani # - what's wrong with the #openid #wordpress plugin? One day it works the next day it doesn't #
- problem found on #openid #wordpress. wp-contactform http://bit.ly/ejoWh using add_options_page in admin_head instead of admin_menu action #
- wow, looks like btrfs is the holy grail of filesystems http://bit.ly/lbXuK . #
- @Yesmail whoever gives permission to perfectly unknown strangers to send messages to them must not be in their right mind in reply to Yesmail #
- @Yesmail and everybody ( on the receiving side ) hates email list rental just like postal list rental ... or possibly even more in reply to Yesmail #
- @iulia_nalatan prosti, da' multi
in reply to iulia_nalatan # - @brassblogs I find this to be a nice feature. If you're a web dev or designer it saves you from some copy/paste in reply to brassblogs #
- RT: @nowsourcing: Overheard in kid's class this morning: "I don't want to buy Netflix, you idiots!" (so much for pop-up ads). #
- @filipcte da in reply to filipcte #
- @mariussescu traba cu capitalul nu e asa importanta dar aia 3 ani de scutire de impozit e prea de tot ... miroase a campanie electorala ?
in reply to mariussescu # - Now you can manage 7000 pages in the #wordpress page admin http://bit.ly/QNf65 #
- SpriteMe http://ff.im/8BUJn #
- If you get any spamy dm from my twitter account, please let me know, It's not me but who knows who has my password ... maybe even a worm
# - @swhitley small and unimportant naming inconsistency
in reply to swhitley # - now you can share your amazon EBS snapshots http://tinyurl.com/yeq4kla #
- @andreisavu dude wtf is that: "interfata api" ? api nu e de ajuns ? in reply to andreisavu #
- @thomasfuchs very useful when you don't want them to know you sent the message at 3 am
in reply to thomasfuchs # - RT @mariussescu : Seth Godin Tries Out Brandjacking http://bit.ly/EH7WI #
- RT: @chrisgarrett: Priceless. Lily Allen rants about piracy, discovers she is a blatant pirate herself http://bit.ly/1fg5HJ #
- Goodmail in my inbox on unsolicited mail http://ff.im/8EhMv #
- @AnnePMitchell There are buttons, the problem is they don't report the spam anywhere. I'm really disappointed with thunderbird about this in reply to AnnePMitchell #
- anyone knows how to make evince go to the next page when I use the scroll wheel or if that's even possible ? #
- @denniston it depends where you send the messages. If all the addresses are at yahoo I doubt you can do even 20k , maybe with multiple ips in reply to denniston #
- @denniston it also depends on the content you are sending and on the qmail setup you have ( patch bundle magic
) in reply to denniston # - I wonder why everyone seems so surprised about the new $100 million vc funding for twitter? #
- @dmuth he wrote man pages though...
in reply to dmuth # - anyone has any idea what needs to be done to german characters in post content so wp_insert_post doesn't break the post ? #wordpress #
- Adjusting as we go http://ff.im/8I99i #
- The "Bulletproof" Button | Silverpop http://ff.im/8Izo6 #
- glad to see my tutorial about setting up squid digest auth ( for 2.6) works on 2.7 too http://tinyurl.com/76jhfg #
- trying to connect to youtube through a proxy on amazon ec2 and it just won't let me #
- bank advice: only use our site to login. I bet all of the phishing victims think they ARE using the bank's site #
Powered by Twitter Tools
Faster wordpress page admin
In a recent post about wordpress I explained how you can create a lot of pages really fast. If that went well and you inserted a few thousand pages in your wordpress blog the page admin became useless. Displaying the list of pages would take 3 minutes for 7000 pages on my test server.
It seems this is not a new problem and there is a bug created in 2007 about it. Although it seems like there was a patch to fix this, the problem still exists in the 2.8.4 version.
Why is this so slow ?
Short story: because wordpress is trying to display and sort pages hierarchically .
At first I thought the problem was caused by the sql queries that fetched all the pages ( even though it doesn't display all of them on a page ) but that was not the case.
After profiling the code with xdebug and Kcachegrind I found there were a few parts of the code that were taking the longest time to complete.
The main problem is that wordpress is trying to find the children for all the pages in an inefficient way. There is this function get_page_children in wp-includes/post.php which was taking about 2 thirds of the total time to complete ( ~ 2 minutes on my example ).
The Solution
I rewrote that function to make it a lot more efficient. In my case it reduced the time from 2 minute to 1-2 seconds but on other page hierarchy it might take more, the worst case being when every page is the parent of another page. The diff is here : Fast page_get_children-0.1 (1.05 KB)
The second problem is that wordpress updates the page cache every time you list pages. This was taking almost 1 minute to complete. I'm not sure if it's the right thing to just remove that call to update_page_cache in wp-includes/post.php get_pages , but doing that made the page admin load in about 15 seconds.
Now this might still be annoying but it's way better then 3 minutes. Hopefully at least the new get_page_children function will b included in the next wordpress release... maybe you can help promote this ticket by giving it a positive vote although I'm not sure if those votes actually have any influence.
This week on twitter 2009-09-20
- goosh.org - the unofficial google shell. http://ff.im/84SRG #
- change your business name, don't let me know about it and then send me a newsletter months/years after. What do I see? spam #
- I just applied for @suretymail accreditation for the email marketing service I'm going to launch soon #
- @mihaibrehar cred ca nu au auzit de "eat your own dog food"
# - @mentby how is this different from simscan ? #
- @abhishekrungta what problems do you have exactly? dm me or send me a message through http://patchlog.com/contact/ #
- i just had this idea: remote bash completion ... and of course ... it's already done #
- I just set up the http://www.wibiya.com toolbar on my blog. Check it out at http://patchlog.com #
- Checking out @shoemoney's SEO checkup Tool Special Offer http://bit.ly/175u4H #
- @jschuller I often find myself in the situation where I have to explain to the customer that what they want is not going to do them good in reply to jschuller #
- @Scobleizer I care more about history, passwords and session sync then bookmarks in reply to Scobleizer #
- @chriswallace me too. I think it's mostly because I do a lot of work in console and I have to remember to quote the whole filename in reply to chriswallace #
- @yoast http://bit.ly/1LpCTT in reply to yoast #
- @yoast or this one : http://tinyurl.com/mp7ou3 if you don't want to use curl and jut remote the part with if($Key=='Location')... in reply to yoast #
- @tudormoldovan va fi WIN atunci cand cele vandute deja se evapora
in reply to tudormoldovan # - Friction http://ff.im/8cTMB #
- How HTML Code Affects E-Mail Deliverability - ClickZ http://ff.im/8d73C #
- @Scobleizer what do you mean it has two boxes? and which one is that ? in reply to Scobleizer #
- @jaybaer actually a lot of people base their opinion on what others say. So when they say "based on my exp..." it's the opposite of that in reply to jaybaer #
- vampire Bill wants to be a spammer http://bit.ly/17fFuN
# - @oldmanuk nice trick indeed but kind of useless in reply to oldmanuk #
- what's up with people posting single work tweets ? like "qmail" ... #
- @wptavern 300+? that' in reply to wptavern #
- @wptavern that's scarry in reply to wptavern #
- @wptavern and a lot of patience
in reply to wptavern # - Carsonified & Why You Should Switch from Subversion to Git http://ff.im/8kKvB #
- @mariussescu uau, startrek
in reply to mariussescu # - The History of Hacking (TIMELINE) http://ff.im/8liUi #
- seems like yahoo mail has a new look #
- Direct Mail, Email, and the "Teaser" Concept http://ff.im/8lGQe #
- I wish the "the_content" filter would pass the post_id to the filter function too #wordpress #
- @ozh @woork very good job indeed, looks just like a #wordpress site
in reply to ozh # - The priority list http://ff.im/8nPh5 #
- What Star Trek Predicts About The Future of Information Security http://ff.im/8nQgc #
- #twitter follow notification messages are useless, I want the bio, few latest tweets, maybe even a direct follow button #
- @DominiqueGoh mo issue here in reply to DominiqueGoh #
- Blogging to Learn http://ff.im/8o2QB #
- Via @AnnePMitchell Use Facebook and Gmail? Your Gmail Password May be at Risk! http://bit.ly/I6h7D #
- @swhitley yeah, not much improvement over the plain text one with just a link to the profile. in reply to swhitley #
- a feature I'd love 2 see in google reader: send comment 2 the original article, like stumbleupon sends 2 friendfeed when I review a site #
- RT: @nonsequitir: Bankers getting bonuses again is like Bin Laden get air miles points for 9/11 #mocktheweek #
- can't wait for this: RT: @lorelleonwp: Looks like the WordPressMU and WordPress merger will be in version 3.0, says @photomatt #
- @wptavern now that's how secure software should be created
... bonus points if it can also publish a post or two by itself
in reply to wptavern # - @Sengupta programmers don't change light bulbs , that's the job of the hardware people in reply to Sengupta #
- JSNES: A NES emulator written entirely in Javascript http://ff.im/8pLQe #
- JSNES: A Javascript NES emulator http://ff.im/8pMMZ #
- how do I scale #wordpress for 7000 pages? only have 700 now and it's too slow already #
- @nonsequitir super-cache speeds it up if you have too much traffic, my problem is I have many pages, Inserting a page takes 20 seconds in reply to nonsequitir #
- @mihaibrehar yeah, found the problem, wp_insert_post was clearing the page cache with every page inserted and I was inserting a lot of them in reply to mihaibrehar #
- @nonsequitir I think it's because I'm inserting a lot of pages and with every insert wp_insert_post is clearing the page cache in reply to nonsequitir #
- the mass page maker #wordpress plugin : http://tinyurl.com/lxk2z2 ... if only I knew about it before I started my import project ... #
- and if you just want to remove a lot of pages easily: http://bit.ly/TFUP6 #wordpress #
- 36 minutes to insert 5000 pages into wordpress after discovering the problem: http://bit.ly/6QqFS #
- Tactics for Reactivating Non-Responders http://ff.im/8qmhf #
- If TV ads were free http://ff.im/8qn2H #
Powered by Twitter Tools
Fast page insert in wordpress
You want to create a lot of pages in wordpress using a script to populate a blog with content you might have. To make sure your script will be compatible with future versions of wordpress you want to use wp_insert_post
The Problem
The more pages you add the slower your script will be. With about 700 pages my script took 12 seconds to add a new page.
After digging through the code I found out that wp_insert_code calls $wp_rewrite->flush_rules() every time a new post or page is inserted and that this is what takes the most time to finish.
Now it makes sense, the more pages you have the more rules ( permalinks ) you have and more time it will take to finish that function call.
The Solution
The call to $wp_rewrite->flush_rules() can be disabled by defining WP_IMPORTING. Now inserting a post takes just a second or less. But you still have to call $wp_rewrite->flush_rules() after you're done inserting all posts. This call will take quite a lot depending on the total number of posts/pages you have but it's a lot better to call it only once then a few hundreds or thousand times.
The way wordpress updates it's rules needs to change. Even if we can solve the bulk import problem by calling flush_rules at the end , we end up with a blog with thousands of pages where trying to publish a new post manually might take 30 or more seconds.
This week on twitter 2009-09-13
- Why Nobody Reads Your Blog http://ff.im/7LDOA #
- just upgraded an etch ami to lenny, thanks: http://bit.ly/Ynpvl #
- json_decode was buggy on php 5.2.0 (etch ) #
- Email Marketing Works Because... http://ff.im/7Peue #
- I got a lot of followers in the last few minutes. I wonder why... Please drop me a reply if you just started following me. #
- if you're not a bot, that is
# - @jockr @yosit thanks for letting me know in reply to jockr #
- Thanks to everyone that let me know about the tweetdeck directory #
- wow I just noticed I have 31 active plugins in wordpress. Should I be worried ? What's your number? #
- The big drop off http://ff.im/7TcNL #
- RT: @elenaplop: Reading Agile vs Waterfall vs Iterative vs Lean Software Development - In Pictures! http://bit.ly/LlAoV #
- can firefox remember passwords by URL instead of domain name ? #
- Encrypting Your Dropbox Seamlessly and Automatically & Pragmattica http://ff.im/7U9ro #
- change firefoxs saved passwords - snarfed.org http://ff.im/7U9rm #
- RT: @wptavern: LOL reviewing the WordPress Dev Meeting yesterday and saw @westi write Automatic as Automattic. They have him brainwashed! #
- I used chromium for a few hours yesterday. It really seems faster then FF 3.5 but without Firebug it's almost useless to me #
- RT: @vladstan: RT @adriana_cocic: Russia's Pres. Medvedev has decreed a new holiday 4 the country: http://bit.ly/e1ao3 (via @bogdanlucaciu) #
- Programmers top 10 sentences http://ff.im/81jGu #
- The difference between a unique index and primary key in MySQL http://ff.im/82cLF #
- Feedback loops being replaced by engagement? | MailChimp Email Marketing Blog http://ff.im/82xKM #
- http://bit.ly/3rDPjM
http://ff.im/82Jcx #
Powered by Twitter Tools
Qmail Big concurrency
Wanna send messages faster with your qmail server? Everyone will tell you to increase the remote concurrency. Till you find out that it can only go as high as 255. If you want more then that you have to apply the big concurrency patch
The Problem
Applying the patch and setting concurrency ( conf-spawn ) bigger then 509 will break the compilation. I was hoping to get at least 1000 on that new quad core ![]()
Why 509? It seems the number depends on the maximum size of the FD_SET array used for "monitoring" ( using select ) the file descriptors ( connection sockets or opened files ) . This limit is set in FD_SETSIZE constant to 1024. In case you're wondering ... the formula that gets you from 1024 to 509 is (FD_SETSIZE-5)/2 ( from chkspawn.c )
Trying to define FD_SETSIZE to a higher value in conf-cc like this -DFD_SETSIZE=4096 doesn't work because FD_SETSIZE is redefined in sys/select.h like this
#define FD_SETSIZE __FD_SETSIZE
__FD_SETSIZE is defined somewhere in /usr/include/bits/types.h ( actually typesizes.h ) to 1024. Defining -D__FD_SETSIZE doesn't work either...I even tried both and still no luck.
The Solution
After hours of digging through mail archives and sites I found this mailing list post that really helped:
Re: fd_setsize
If you just want to get it working just download my patch qmail Big Concurrency fix-1.0 (898 bytes) , apply it ( after you apply the big concurrency patch ) , set conf-spawn to something big ( but less then 65000 ) and then you should be able to compile qmail.
If you want to know how it's done, read bellow...
It seems like the solution is to include bits/types.h, undefine __FD_SETSIZE and then define it to a higher value. The author of that post says that this is not a good idea (from the portability point of view, but I don't care about that ) since programs should never directly include bits/types.h ( true ) but the alternative is to modify that system file, again not a good idea since it will be overwritten by a possible update.
My first idea was to just use that code from the mailing list post into the select.h2 , since this is the file used to generate select.h and select.h is included in spawn.c and ckhspawn.c but this didn't work because spawn.c was including "select.h" after "sys/types.h" so even if select.h would define __FD_SETSIZE it would be useless since FD_SETSIZE ( this is the one that really matters ) would have been already defined in sys/types.h .
The solution I found at the time was to just move "select.h" at the top of the file and remove "sys/types.h" since it was already included from select.h but now I realized I could have just as well undefined and defined FD_SETSIZE too inside select.h
And that's the story about how I got to run 1000 concurrent connection in qmail.
The real problem
Now that we can have so much concurrency we hit another wall. Qmail, as most other MTAs, doesn't have any way of controlling the remote concurrency per destination domain.
At 1000 simultaneous connections it's very likely that it would create a few tens or hundreds of connections simultaneously to the same domain.
When this happens that domain will just ban your ip. So how do we fix this one?
PS: I have an answer but I want to see what you have for a solution
so hit the comments...
php tail
The Problem
You want the functionality of tail(1) in a php function.
The Solution
Bellow is a php function that gives you the last N lines from a file. It doesn't do everything tail(1) does ( like following, retrying, etc ) but just the basic stuff:
function tail($file, $num_to_get=10) { $chunklen = 4096; $data="";$ret="";$lc=0; while($chunklen > 0) { for($i=$dl-1;$i>=0;$i--){ if($data[$i]=="\n"){ if($lc==0 && $ret!="")$lc++; $lc++; if($lc>$num_to_get)return $ret; } $ret=$data[$i].$ret; } if($position-$chunklen < =0 ){ $position = $position - $chunklen; } return $ret; }
It may not be the fastest solution but it works.
Have a better one? Please let me know about it.
This week on twitter 2009-09-06
- Anyone tried SponsoredTweets yet? How is it working for you? http://bit.ly/VCcIx #
- Computer Programming Algorithms Directory http://ff.im/7xb8x #
- Molecular Expressions: The Silicon Zoo - Runaway Train http://ff.im/7xb8y #
- http://whattheinternetknowsaboutyou.com/ http://ff.im/7yTRF #
- Clean Your SQLite Databases to Speed up Firefox | Firefox Facts http://ff.im/7yTRG #
- @divinewrite Thanks mate!
You have a lot of great info in those articles. in reply to divinewrite # - Track SEO rankings and Sitelinks with Google Analytics II http://ff.im/7AtOi #
Powered by Twitter Tools
PatchLog