File managers, parallel copy, and performance hits

March 21st, 2009 at 22:01

Scenario:

I’m using an everyday modern GUI file manager to copy a huge file to my portable harddrive. A few minutes later, while the first transfer is still in progress, I realize that there were actually two huge files I want to copy to my portable harddrive.

What are my options? I could copy the other file right away, or I can babysit the file copy and copy the second file when the first one is complete.

The disadvantage to the first option is that the files copy in parallel. Harddrives operate best when used sequentially. Copying the files in parallel reduces the total rate at which data is copied to the harddrive, thanks to the head having to jump around to different parts of the disk. The total time of the operation is significantly increased.

The disadvantage to the second option is that I have to sit here and babysit a process that should be automatic. I’m just copying two files. What if my two files would take 15 minutes each to copy, but I want to do something that is going to take half an hour, and then come back and collect my portable harddrive and then leave immediately?

Another similar scenario: I want to copy two huge files to the same destination, but they’re in different source directories, meaning I can’t select both of them at once.

The solution? I think a queue would work nicely, but while solving one problem would introduce several others:

  • User confusion: “why aren’t my files copying [right away]?”
  • Having to add a complex UI to otherwise simple and unintrusive file-copy dialogs to handle queues – at the very least, you need to be able to remove the items from a queue
  • Priority/detection of when queues should be used; should queues always be used when the destination device is the same for every file? Should queuing be explicit [Copy -> Paste (add to queue)]? Or maybe it is parallel that should be explicit [Copy->Paste Immediately]?

Why don’t we see a queue feature in modern file managers? Is it because it’s just a hard problem, or is there a good reason not to have it? Or perhaps I’m just using the wrong file manager?

Comments (24)

24 Comments »

  1. You’re using the wrong file manager :) .
    Total Commander has had a queue function for copy and move operations for years. Well, I’m assuming here that you’re talking about Windows and are willing to pony up some cash for a powerful FM.

    Comment by http://token.ro/ — March 21, 2009 @ 11:03 pm

  2. If it’s a huge file, as you say, your disk is jumping all over the place anyways (unless you’ve defrag’ed very recently).

    Comment by Tim McClarren — March 21, 2009 @ 11:05 pm

  3. What operating system are you using?

    Anyway, here’s a short demo how GNOME’s nautilus file manager does multi-file copy: http://www.gnome.org/~alexl/nautilus-gio-copy.ogg

    Comment by Priit Laes — March 21, 2009 @ 11:29 pm

  4. Modern OSes use anticipatory elevator algorithms to do disk reads and writes efficiently and extract the maximum performance from the drives, whether you copy one file or ten, and given the right algorithm, you should not see any interactivity impact.

    Sorry, but no cookie. Just copy the two files in parallel, you will experience a minimal or throughput impact.

    Comment by Rudd-O — March 21, 2009 @ 11:33 pm

  5. on windows simply install teracopy.

    it replaces the default explorer behaviours by becoming the registered handler for file operations. it will queue up files, allow retries (including as admin if needed) and give a far more sensible ETA.

    it’s also free.

    Comment by olli — March 22, 2009 @ 12:02 am

  6. Here is your solution:
    http://sourceforge.net/projects/supercopier/

    Comment by hax0r — March 22, 2009 @ 12:41 am

  7. Yes, you should try Total Commander. Is a very powerful and versatile file manager that can satisfy all of your needs.

    Comment by Goo — March 22, 2009 @ 12:53 am

  8. The problem is when file 1 is big and file 2 is small.
    Then the user might expect file 2 to finish as soon as they start the copy but instead they have to wait 15mins until file 1 is finished.

    Prioritizing this isn’t hard but what is hard is to define big and small. For some users 15 second might be a short time while others think 1sec.

    But i think that a simple queue wouldn’t be too hard either to to code or understand. Most FTP-clients have this and you can instantly see when a file will be copied and estimate the time it will take.

    Comment by coo — March 22, 2009 @ 1:53 am

  9. One thing to consider is that by the time a queue was implemented in say, Windows, it would be pointless because SSD’s will be so prolific. SSD’s don’t suffer a performance hit when random writing so parallel writes would not be a problem.

    Comment by Joe Carroll — March 22, 2009 @ 3:58 am

  10. Super Copier 2 and Terra Copy both add this functionality to windows explorer.

    I prefer Super Copier but it has been abandoned apparently.

    Comment by Andreas — March 22, 2009 @ 4:30 am

  11. I think this was proposed for Ubuntu a while back; something that resembled Firefox’s download manager integrated into the file copy status dialog.

    Comment by Alex — March 22, 2009 @ 5:04 am

  12. One of the first apps I install is the fantastic TerraCopy, which is free for personal use (i bought it anyway its so great) and has queuing and many other features.

    Comment by Dan — March 22, 2009 @ 5:39 am

  13. If you want an even better option, check out teracopy.

    Comment by isosmith — March 22, 2009 @ 6:25 am

  14. There is no need to waste money, Ubuntu do it, in the right way.

    Comment by Scorp — March 22, 2009 @ 6:42 am

  15. http://www.codesector.com/teracopy.php

    Comment by Chmunk — March 22, 2009 @ 7:28 am

  16. The idea queued actions already exists in Filezilla (an FTP client, a filemanager of sorts I guess). I guess the scenarios you describe (i.e. high latency copy) is more common in FTP land.

    Comment by Paschal — March 22, 2009 @ 8:14 am

  17. With a few large files the difference of copying in parallel is going to be fairly minimal. The operating system will buffer up large writes and the disk will reorder the writes. Sure it might go a few percentage points faster but its not worth the more complex file manager which would confuse the users when the copies did not happen as they expected.

    Comment by Paul Keeble — March 22, 2009 @ 9:48 am

  18. Reminds me of doing the laundry. I put clothes in the washing machine, then I have to pop back half an hour later to switch them to the dryer. The attendant can do it, though, if I pay her a little.

    Comment by Toby Champion — March 22, 2009 @ 10:13 am

  19. I’m guessing you’re using either MacOS or Windows – or you’re a non CLI linux user because the solution on the command line in Linux is simple:

    for i in {file1,file2}; do cp $i /path/to/usb/drive/$i; done

    which will do them sequentially be default as the second call to cp won’t be called until the first has complete.

    You could argue that you specified you were using a GUI file manager, but I would counter that you should be choosing the right tool for the job.

    Comment by Steve — March 22, 2009 @ 10:55 am

  20. Gah… the CLI solution is terrible, and doesn’t deal with the situation described. The first file copy has already started… you could ^C the copy and start over (not a terrible option if using rsync, I suppose, or ^Z the copy and then use “bg” to continue it in the background, and start a new copy, or you can just type ahead and when the shell finishes the first copy, it’ll run the command(s) you’ve typed.

    But as “modern” has been specified, you’re presumably using a ‘modern’ OS, which means that your OS will have buffers, and much of the writing to the external drive will merely fill the disk buffers in the OS, and the OS will (as pointed out above) automatically figure out the best way to write the data to disk — which doesn’t really matter anyway unless you’ve partitioned your external disk into multiple partitions.

    It’s more important to queue network downloads than disk writes.

    Comment by SJS — March 22, 2009 @ 5:13 pm

  21. WHAT

    Comment by Billy O'Smithwickhamshirefordglengarry — March 22, 2009 @ 10:45 pm

  22. @11, this is and has been a part of Gnome which ububtu uses for quite some time.

    Comment by krs — March 23, 2009 @ 12:26 am

  23. I just have a batch on my desktop called copy.cmd

    It uses robocopy (free from Microsoft). On Linux/BSD/unix I have one called copy.sh which uses rsync

    I just edit the file, plug in the info and double click. Robocopy is pretty nice.

    @echo off

    set SRC=”D:\DVD”
    set DEST=”\\fire2\public\dvds”

    robocopy2k3.exe %SRC% %DEST% /E /R:5 /W:1 /X /V /Z /ETA /XN /XO

    Comment by J — March 23, 2009 @ 6:15 am

  24. CLI FTW.

    This may be the fail approach but you could always type while ‘cp’ is copying, hit enter, and as long as the program doesn’t require any further input your command will be run once the previous is finished.

    You could also use ‘at’ and have your second copy command run around when the first one should be complete. It may not be perfect but its more efficient then copying two files at once and requires no baby sitting.

    Comment by Michael — April 26, 2009 @ 12:08 am

RSS feed for comments on this post. TrackBack URL

Leave a comment