You transfer files over scp or sftp to a server and there you have a script that processes the new files. you want the script to only start processing the files once they are completely transferred.
There's no way of knowing when the files have been fully transferred, both sftp and scp would create the files as soon as the transfer begins and will close then when it finishes. So between the time it creates them and until it closes them the files are incomplete.
There's an easy solution: upload a lock file before you start uploading the real files and remove the lock file after the upload is finished. Modify your processing program/script to look for a lock file and only start processing if the lock file does not exist. This is good if you can modify the upload and processing scripts/programs but that's not always the case.
The harder solution involves modification to openssh source code. I created a patch that modifies scp and the sftp server so that for every file received the server will actually put the contents in a temporary file and only move the file in the real destination when/if the upload is complete. The move operation ( rename ) is atomic only when moving the files in the same filesystem but that's not a big problem cause we can configure the tmp location to be on the same filesystem.
Both scp and sftp server were modified so you get similar functionality by using any of them.
This patch was tested with openssh 4.6.p1. It may work with newer versions but first you should try with the same version so download the source code for 4.6p1 and decompress it.
Download my patch: [download#7]
Apply the patch:
- cd openssh-4.6p1
- patch -p1 < openssh_scp_sftp_atomic.diff
Then run configure with whatever parameters you want, make and install it.
By default scp and sftp-server will use /tmp as the temporary location where they save files till the upload is complete.
If /tmp is not on the same filesystem and the actual file destination then you have to specify a different temporary location in order to make this really atomic.
For sftp-server you can do it by adding another parameter to the Subsystem line in sshd_config
It normally looks like this (on gentoo x86_64) :
Subsystem sftp /usr/lib64/misc/sftp-server
or ( on ubuntu 9.04 )
Subsystem sftp /usr/lib/openssh/sftp-server
You have to add " -t /new/tmp/location " to that line ( without the quotes )
/new/tmp/location should be on the same filesystem as the real destination.
For example if you have /home mounted on a separate partition and you upload in /home/user you should create a temporary folder in /home and set that as the folder to be used by sftp-server.
- mkdir /home/tmp
- chmod 1777 /home/tmp # all write/read and sticky
And the configuration line should be something like :
Subsystem sftp /usr/lib/openssh/sftp-server -t /home/tmp
Scp also needs special configuration if you want to set a different temporary location but in this case we could not just pass a special parameter to it because the scp client would not allow that so I had to make a wrapper for the scp program on the server.
The wrapper would just pass the custom temporary location in a environment variable then call the actual ( patched ) scp program.
I had scp in /usr/bin/scp so I moved that in /usr/bin/scp.bin
and I created a script named /usr/bin/scp with the following content:
- export TMP=/home/tmp
- scp.bin $@
all that's left to do is:
- chmod 755 /usr/bin/scp
That's it! Now you have atomic uploads for scp and sftp.