I’m currently watching the progress of a 4tB rsync file transfer, and i’m curious why the speeds are less than the theoretical read/write maximum speeds of the drives involved with the transfer. I know there’s a lot that can effect transfer speeds, so I guess i’m not asking why my transfer itself isn’t going faster. I’m more just curious what the bottlenecks could be typically?

Assuming a file transfer between 2 physical drives, and:

  • Both drives are internal SATA III drives with 5.0GB/s 5.0Gb/s read/write 210Mb/s (this was the mistake: I was reading the sata III protocol speed as the disk speed)
  • files are being transferred using a simple rsync command
  • there are no other processes running

What would be the likely bottlenecks? Could the motherboard/processor likely limit the speed? The available memory? Or the file structure of the files themselves (whether they are fragmented on the volumes or not)?

  • Max-P@lemmy.max-p.me
    link
    fedilink
    arrow-up
    3
    ·
    5 months ago

    SATA III is gigabit, so the max speed is actually 600MB/s.

    What filesystem? For example, on my ZFS pool I had to let ZFS use a good chunk of my RAM for it to be able to cache things enough that rsync would max out the throughput.

    Rsync doesn’t do the files in parallel so at such speeds, the process of open files, read chunks, write chunks, close files, repeat can add up. So you want the kernel to buffer as much of it as possible.

    If you look at the disk graphs of both disks, you probably see a read spike, followed by a write spike on the target, instead of a smooth maxed out curve. Then the solution is increasing buffers and caching. Depending on the distro there’s a sysctl that may be on by default that limits the size of caches to prevent the “I wrote a 4GB file to my USB stick and now there’s 4GB of RAM used for it and it takes hours after finishing the transfer before it’s flushed to the stick”.

    • archomrade [he/him]@midwest.socialOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      5 months ago

      SATA III is gigabit, so the max speed is actually 600MB/s.

      My mistake, though still, a 4tb transfer should take less than 2hr at 5Gb/s (IN THEORY) Thank you @Max_P@lemmy.max-p.me for pointing this out a second time elsewhere: 6Gb/s is what the sata 3 interface is capable of, NOT what the DRIVE is capable of. The marketing material for this drive has clearly psyched me out, the actual transfer speed is 210Mb/s

      The filesystem is EXT4 and shared as a SMB… OMV has a fair amount of ram allocated to it, like 16gb or something gratuitous. I’m guessing the way rsync does it’s transfers is the culprit, and I honestly can’t complain because the integrity of the transfer is crucial.

        • archomrade [he/him]@midwest.socialOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          Thanks, corrected my comment above.

          I’m interested in ksmbd… I chose SMB simply because I was using it across lunix/windows/mac devices and I was using OMV for managing it, but that doesn’t mean I couldn’t switch to something better.

          Honestly though, I don’t need faster transfers typically, I just happen to be switching out a drive right now. SMB through OMV has been perfectly sufficient otherwise.

          • d3Xt3r@lemmy.nzM
            link
            fedilink
            arrow-up
            1
            ·
            5 months ago

            ksmbd is still SMB, except it’s implemented within the Linux kernel. As a result, file transfers speeds are improved greatly compared to pure-Samba which runs only in userspace.

            The second thing is, you need to check which SMB protocol you’re using, ideally you’d want to use at least SMB 3, anything older than that will be painfully slow.

            Finally, I read in your other comment that you’re using spinning disks and a USB dock. That adds significant overheads.

            The Ironwolf drive benchmarks starting at 250MB/s and slows down to 100MB/s as it reaches the end of the drive. (spinning disks gradually become slower the more full it becomes.) Now add file fragmentation + filesystem overheads (buffers, cluster size allocation etc) and the speeds could go down considerably.

            Then there’s your SATA > USB dock - no dock would ever reach 5Gbps, that’s just false advertising - it’s only mentioning the theoretical protocol speed. In reality, you’d be seeing something like below 100MB/s write speeds for 128k sequential writes, but if your block size is smaller, expect far slower writes.

            Combine all of the above and you can imagine just how much slower this whole thing can be.

            For reference, see this benchmark as an example, to see what’s “normal” for a simple file transfer to a blank drive with no fragmentation: https://www.anandtech.com/show/6014/startechcom-usb-30-to-sata-ide-hdd-docking-station-review/3

  • MNByChoice@midwest.social
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    5 months ago

    Looks like you have your answer, but there are a crazy number of possible issues.

    The biggest cause is misreading the performance specs.

    A partial list of other options:
    Mechanical drives store data in rings. Outer rings have higher speeds than inner due to constant angular velocity.
    Seeks cost a lot of throuput on mechanical drives.
    Oversubscribed drive cables.
    HBA issues.
    PCIe data path conflicts
    Slow RAM
    RAM full or busy
    Extra cpy within RAM
    NUMA path issues (of drives are connected to different NUMA nodes. Not an issue on desktops.)
    CPU too busy
    Transfer software doing extra things
    File system doing extra.
    RAID doing extra.
    NIC on a different NUMA node than HBA (can be good or bad).
    NIC sharing the data path in a conflicting way.

    There are others. Start with checking theoretical performance from data sheets.

    Also, details matter, and I don’t have enough of them to guess.

  • mozz@mbin.grits.dev
    link
    fedilink
    arrow-up
    1
    ·
    5 months ago

    Almost certainly, the bottleneck is one or both of:

    1. The platters can’t simply spin at full speed reading a sequential stream of bytes from one and writing it to another - they periodically have to search around to different places stitching the file’s byte stream together from discontinguous chunks or reading or writing metadata. Seek latency of the platter will overshadow any tiny delays incurred because of memory or CPU delays.

    2. The algorithm is doing something in a fashion that causes delays (e.g. reading each file individually and waiting until it can sort out if it needs to send anything for that file before starting I/O operations for the next).

    Idk if you can do anything about #1 but in similar situations I’ve had good mileage preventing #2 with “tar cj /somewhere | ssh me@host ‘cat | tar xj’” (roughly speaking, you obviously may have to adjust things to make it actually work, and on very fast networks maybe it’s better to skip the -j, but that’s the rough idea).

    Edit: Oh, I misread, is this local? I saw rsync and just though it was a network transfer. What kind of speeds are you getting? Does doing “tar c /original | tar x” or something like that work any faster?

  • Shdwdrgn@mander.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    You didn’t mention if this is a HDD or an SDD. If it’s a HDD, you will never even reach SATA 2 speeds, although you should be able to saturate SATA 1. Realistically you might be able to push around 200MB/s on newer HDDs but that’s assuming nothing else gets in your way.