I’m currently watching the progress of a 4tB rsync file transfer, and i’m curious why the speeds are less than the theoretical read/write maximum speeds of the drives involved with the transfer. I know there’s a lot that can effect transfer speeds, so I guess i’m not asking why my transfer itself isn’t going faster. I’m more just curious what the bottlenecks could be typically?

Assuming a file transfer between 2 physical drives, and:

  • Both drives are internal SATA III drives with 5.0GB/s 5.0Gb/s read/write 210Mb/s (this was the mistake: I was reading the sata III protocol speed as the disk speed)
  • files are being transferred using a simple rsync command
  • there are no other processes running

What would be the likely bottlenecks? Could the motherboard/processor likely limit the speed? The available memory? Or the file structure of the files themselves (whether they are fragmented on the volumes or not)?

  • mozz@mbin.grits.dev
    link
    fedilink
    arrow-up
    1
    ·
    5 months ago

    Almost certainly, the bottleneck is one or both of:

    1. The platters can’t simply spin at full speed reading a sequential stream of bytes from one and writing it to another - they periodically have to search around to different places stitching the file’s byte stream together from discontinguous chunks or reading or writing metadata. Seek latency of the platter will overshadow any tiny delays incurred because of memory or CPU delays.

    2. The algorithm is doing something in a fashion that causes delays (e.g. reading each file individually and waiting until it can sort out if it needs to send anything for that file before starting I/O operations for the next).

    Idk if you can do anything about #1 but in similar situations I’ve had good mileage preventing #2 with “tar cj /somewhere | ssh me@host ‘cat | tar xj’” (roughly speaking, you obviously may have to adjust things to make it actually work, and on very fast networks maybe it’s better to skip the -j, but that’s the rough idea).

    Edit: Oh, I misread, is this local? I saw rsync and just though it was a network transfer. What kind of speeds are you getting? Does doing “tar c /original | tar x” or something like that work any faster?