Tuesday, February 8, 2011

Speeding up your secure file-transfer by using tar and ssh

Normally when transferring files between computers, one would use 'scp', and for a directory 'scp -cr'. The problem with this is that scp creates a connection for each file when transferring it, and this adds a lot of overhead.

To speed up the transfer of many small files, it is better to use the beloved Unix pipe, together with tar (the archiving software) and SSH (Secure SHell). What we do is 'tarball' all these files and directories, compress them if we would like to, and pipe the result to ssh, which on the other computer either puts them in a (compressed) tarball or untar it back into its original directory structure.

To decrease the use of bandwidth, we also set tar to filter the output through gzip, a compression program. This is done using the '-z' flag. The other two flags we use is '-c' and '-f' which means 'create archive' and which file to create (in our case '-' which means stdout) respectively.
Tar usually uses stdout if nothing else is specified, so we could actually skip the last flag.

Using scp, it would look something like this:
scp -rc directory/ user@host:~/

But with tar and ssh, we instead write it like:
tar -zcf - directory/ | ssh user@host "tar zxvf -"

I have created a directory on the receiving host called 'backup', so I want to put it into that directory instead:
tar -zcf - directory/ | ssh user@host "tar zxvf - -C backup/"

or if we remove the '-f' flag with its corresponding argument:
tar -zc directory/ | ssh user@host "tar zxv -C backup/"

I tried using bzip2 as a filter as well, but it seems to have a tendency to be too cpu-heavy in comparison to its improved compression ratio. But this depends on the amount of available bandwidth.

I found some comparisons at http://smaftoul.wordpress.com/2008/08/19/ssh-scp-little-files-and-tar/, which saw an 19x transfer time improvement when transferring a 109 Mb large directory containing 9992 files. This is definitely worth a little command line magic!

2 comments:

smaftoul said...

Thanks for mentioning my blog ! ;)

Peter said...

np =)
Thanks for the info!