Something went wrong and your mysql replication broke, I'm talking here about problems with the sql thread, not connection problems.
The sql thread shows you an error, what do you do to fix it and resume replication?
Here are 3 ways to fix it, each has advantages and disadvantages, pick the one that fits best to your problem.
1. Skip over the problem
You can try to just skip over the statement that broke the replication by changing the position in log file.
There are two ways to do this:
a) you can skip gradually
slave stop;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
slave start;
show slave status \G
That would skip the next 1 statement but you can set the counter higher to skip more the one.
Do it until the slave status shows the SQL thead is running.
b) skip to the current position
Use this is the first method keeps showing other statements that break replication and you don't have time to gradually skip statements.
First go on the master and type: show master status to find which is the current bin log file and the current position within the file.
Then go on the slave, stop it with "slave stop" and change the file name and position. Something like:
slave stop;
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001958', MASTER_LOG_POS=52562937;
slave start;
But do that with your own file name and position taken from the master.
Check the replication status with "show slave status".
If the results are good ( both Slave_IO_Running and Slave_SQL_Running are Yes ) then you can go to the next step otherwise skip to next methods.
At this point you have a working replication but probably the data on the slave is not the same as on master since you skipped a few sql statements.
To fix it you can use maakit ( mk-table-checksum and mk-table-sync )
2. Full Dump and Restore
Connect to master, dump everything in a sql file, copy to replication slave and load it in mysql.
Use --master-data so the replication position is set in the dump file and the slave will know where to start.
Use --disable-keys so the slave will not try to build indexes after each insert and only built them at the end of the import.
Use --add-locks to surround each table dump with lock table/unlock table - this makes the inserts faster in the slave.
Problem:
--master-data will put a read lock on tables so operations on your master will lock waiting for the dump to finish. On large databases this can take a long time and it's unacceptable.
Possible fix:
If you have innodb tables add --single-transaction so a single global lock will be used only for a short time at the beginning of the transaction.
The problem is not so big if you can have filesystem snapshots on the master like the ones created by lvm.
3. Inconsistent Full Dump
This is just another fix for the problem at #1. Dump the data just like before but without using --master-data. This means no locks so the master can still work normally.
But because you don't use --master-data you will have to set the position in the slave yourself.
on the master type:
show master status \G
Take the file name and position and use them in the CHANGE MASTER statement on the slave ( after you load the dump file ) . Something like:
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.001958', MASTER_LOG_POS=52562937;
Of course all of this will create an inconsistent slave but you can fix this easily with maakit.
If you know other methods I'd love to hear about them. Let me know in the comments.