Jump to content

Speed up Handbrake using a ramdrive?


Recommended Posts

Posted

So, due to ripping the new "The Signal" (a trippy movie featuring Lawrence Fishburne) and not paying attention I overwrote the old "The Signal" (a zombie-esque movie). Re-ripping it I got to thinking...is there anyway to speed up the process? Granted the cores were basically pegged in the iso to mkv transcode, but perhaps I could be not feeding it enough data and it could go faster?

This was done on my laptop (a Core i7 3630QM; 2.4 gHz quad-core with hyper threading) running openSUSE 13.2 (kernel 3.16.6-2).

I decided to create a quick and dirty ramdrisk using /dev/shm. The commands for openSUSE are below. Note I executed them as root.

mkdir /run/media/dave_boo/ramdisk && mount -t tmpfs none /run/media/dave_boo/ramdisk -o size=3000m

Next up was to get a baseline. At least in openSUSE you can not use "hdparm -t /dev/shm" as it complains "BLKGETSIZE failed: Inappropriate ioctl for device" and "BLKFLSBUF failed: Inappropriate ioctl for device". This left me with "dd". Of course I first had to transfer the file over to the ramdisk which I did as below using my favourite "rsync".

rsync -avr --progress Signal\,\ The/ /run/media/dave_boo/ramdisk

It copied at 269.1MB/s. A bit low (a problem with rsync not just dumping the information?), but whatever. So I then used "dd" to see the speed of my ramdisk.

dd if=/run/media/dave_boo/ramdisk/The\ Signal.mkv of=/dev/null bs=4M

This was done 5 times and the results averaged out to 6.22 GB/s.

I then tried to dd the original file from my desktop but didn't take into account caching. However, the first run gave me 457 MB/s. Not shabby. I then ran it from my 2.5" portable hard drive and got a speed of 91.8 MB/s. Pretty good for a device that's almost 90% full!

So it was time to test my theory.

I first did the transcode from a 1.7 GB (a lot of that is the audio as I like my surround sound) 720x480 mkv down to the iPod preset in Handbrake (version 0.10.0 (x86_64)) using the file in the ramdisk. I chose a pre-transcoded file and the lightest preset to present the least load to the cpu and shift it towards the storage subsystem thus excessively highlighting any deficiencies. On all 8 threads the usage averaged between 70-77% watching it with htop. The average fps was 386.5.

I then allowed the machine to cool back down (the cores were between 92 and 101 at the end of the run! and I let it cool down to between 48 and 53...the time it took to smoke a cig). I then ran the same presets off the ssd. CPU usage actually dropped down to 65-75%. Odd. Average fps was 379.3.

Last up was from my usb hdd. Of course I let it cool down again and started the transcode all over. This also fluctuated between 65-75% usage. Average fps was

So it appears there may be a 2% increase in speed using a ramdisk over a fast sad (and just a bit more over using a usb hdd), but I personally would consider that within the margin of error. So it appears my theory that the application is not being fed enough data was incorrect. The lack of the cores being pegged has me worried however. I don't know the reason for that; perhaps there's a single threaded programme running that the main x264 is waiting on?

  • ramdisk...6.220 GB/s....386.5fps
  • ssd..........0.457 GB/s....379.3fps
  • usb..........0.092 GB/s....377.3fps

Over the next couple of days I plan on testing various different combinations; such as how does the core count (thread parking) affect scaling? If there's any requests for benchmarks in this vein I'd be happy to fit them in.

Posted

Wow, you have WAY to much free time on your hands! [emoji6]

(but thanks for the tip on 'the signal', love fishburn... downloading now)

sent from my slimkat 1+ using tapatalk

Posted

Wow, you have WAY to much free time on your hands! [emoji6]

(but thanks for the tip on 'the signal', love fishburn... downloading now)

sent from my slimkat 1+ using tapatalk

It only took 6 minutes 35 seconds per transcode to be fair!

It actually kept interrupting my side missions in Borderlands the Pre Sequel!

**edit**

If I actually had too much time on my hands I would have ran each transcode numerous times and averaged the results....coffee1.gif

  • Like 1
Posted

I decided to a TL;DR for those with ADD/HD (or those that don't want to scan through a wall of text). I think the easiest is to present the graph of my findings. Please note that "R" means "real" core and "F" means "fake" core (or the hyper threading part of the core).

What I found interesting was the fact that Intel seems to have done a good job about not having the 'fake' cores (hyper threading) give up too much performance if they are used by themselves. For instance, if you were to look at the 1R versus the 1F there's not that much of a spread. But when both have to compete there is nowhere close to a doubling of fps as there is going from 1R -> 2R or even 1F -> 2F.

To go from 1 core to 2 core there is a 186% increase in fps. To go from a 1 core to a 3 core there is a 246% increase in fps. To go from a 1 core to a 4 core there is a 301% increase in fps. Going all the way from 1 core to hyper threading quad core is a 310% increase in fps. As such I can't really see the point in doing single transcodes on anything with more than 4 cores. The scaling just isn't there to justify it. However, if using something like an AMD Bulldozer or better processor, one could quite conceivably run 2 handbrake jobs concurrently with little to no decrease in fps to either. This should be faster than transcoding them serially.

If one were to not need all the other advantages that an i7 offers over an i5, it could be argued that the 3% increase in fps just does not justify purchasing an i7 for transcoding.

Everything below is for those interested in the process.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~STOP READING HERE UNLESS YOU'RE A MASOCHIST~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ok, in preparation of the thread parking (easier to do than core parking), here's a list of programmes you need.

  • sensors (used to monitor temperatures)

  • taskset (used to force the programme to only use certain 'cores')

  • cpupower (used to monitor turbo speeds)

  • htop (used to monitor % of cpu usage


I'm obviously a fan of cli programmes, and perhaps there are gui versions, but I simply don't care. My computer, my post, my prerogative. Here's the command on openSUSE to install all of the above. It of course assumes that you have Handbrake (http://handbrake.fr) installed and are installing as root

zypper in sensors taskset cpupower htop

Or for those on Debian (yes, Mint and Ubuntu are derived from Debian!) here's the commands for you.

sudo apt-get install lm-sensors taskset cpupower htop

Let's look at those for a minute. lm-sensors needs to be configured. It's easy to do if you simply run the below. Not using "--auto" means you have to answer "y" or "n" for each sensor it thinks it finds....quite annoying.

sudo sensors-detect --auto

You can than simply run "sensors" (sudo/root not needed) to display the temperatures of your processor. It appears to only list real cores (no hyper threaded 'cores') on the 3630QM. Of course YMMV.

taskset needs no configuration, but a bit of understanding. To restrict the programme to the cores, you first open Handbrake. Then you find out its pid.

pgrep ghb

For me the pid was 2164. You then stuff it into the below (root not needed if Handbrake was launched as regular user).

taskset -pc 0,1 2164

So, let's work from right to left. The 2164 is obvious, it's the pid you got for Handbrake. The 0,1 are the 'cores' you want to limit the programme to. A word of note; 0,2,4,6 are 'real cores' and 1,3,5,7 are 'fake cores' as they are the hyper threading components. You can limit the programme using the numbers individually (0,1,2,3...) or using using a range (0-3).

cpupower needs no configuration either. The way I use it is below.

cpupower monitor

It spits out a bunch of stats that are handy. You can also use below to see the max turbo for your cores. as well as your governor. Note that if it says "The governor "performance" may decide which speed to use within this range." your cpu will be running as fast as it can. You can set it below (after benchmarking or transcoding set it back to "powersave" to keep it cool).

cpupower frequency-set -g performance

htop doesn't need any configuration either; simply run it to see what's going on. This makes sure that your programme is parked where you told it to be and you can see what % of the processor it is using.

I open up three terminals side by side so that I can see what's going on. One has htop, one has cpupower and the final has sensors. Handbrake is resized to fit underneath in the bottom part of the screen.

So, let's get into it shall we?

First up is core0. Remember that this is a 'real' core, so it can be compared to the old single core processors.

  • turbo speed............3.4, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........80, 72, 63, 60 C
  • % usage of cores...100, 0, 0, 0, 0, 0, 0, 0 %
  • fps..........................123.3

Next is core1. This is the first core's hyper threading unit, so we can expect it to be slowest of all.

  • turbo speed............3.2, 3.4, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........83, 74, 72, 64, 61 C
  • % usage of cores...0, 100, 0, 0, 0, 0, 0, 0 %
  • fps..........................122.4

Next is core0,1. This is the first core 'real' and 'fake'; meaning an actual core + hyper threading.

  • turbo speed............3.4, 3.4, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........84, 71, 65, 59 C
  • % usage of cores...98, 98, 0, 0, 0, 0, 0, 0 %
  • fps..........................149.2

Next is core0,2. This compares 2 'real' cores against the 'real + fake' hyper threading (second test).

  • turbo speed............3.3, 3.2, 3.3, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........86, 87, 71, 62 C
  • % usage of cores...97, 0, 97, 0, 0, 0, 0, 0 %
  • fps..........................229.6

Next is core1,3. This compares 2 'fake' (hyper threading) cores against the 'real + fake' and 'real + real' (second and third tests respectively).

  • turbo speed............3.2, 3.2, 3.2, 3.3, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........89, 90, 74, 66 C
  • % usage of cores...0, 97, 0, 97, 0, 0, 0, 0 %
  • fps..........................229.2

Next is core0,2,4. This is 3 'real' cores.

  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........88, 94, 90, 74 C
  • % usage of cores... 87, 0, 87, 0, 87, 0, 0, 0 %
  • fps..........................302.9

Next is core0,1,2. This is 2 'real' cores and 1 'fake' core.

  • turbo speed............3.3, 3.3, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........92, 91, 74, 66 C
  • % usage of cores...93, 94, 85, 0, 0, 0, 0, 0 %
  • fps..........................243.9

Next is core0,1,3. This is 1 'real' and 2 'fake' cores.

  • turbo speed............3.3, 3.3, 3.2, 3.3, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........80,72,63,60 C
  • % usage of cores...96, 96, 0, 87, 0, 0, 0, 0 %
  • fps..........................243.4

Next is core1,3,5. This is 3 'fake' cores.

  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........89, 92, 89, 74 C
  • % usage of cores...0, 86, 0, 86, 0, 86, 0, 0 %
  • fps..........................299.2

Next is core0,2,4,6. This is 4 'real' cores.

  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........93, 99, 98, 92 C
  • % usage of cores...83, 0, 83, 0, 84, 0, 82,0 %
  • fps..........................371.0

Next is core0,1,2,4. This is 3 'real' and 1 'fake' core.


  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........95, 96, 92, 76 C
  • % usage of cores...87, 87, 84, 0, 82, 0, 0, 0 %
  • fps..........................325.2

Next is core0,1,2,3. This is 2 'real' and 2 'fake' cores.


  • turbo speed............3.3, 3.3, 3.3, 3.3, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........91, 95, 77, 66 C
  • % usage of cores...85, 87, 83, 84, 0, 0, 0, 0 %
  • fps..........................267.6

Next is core0,1,3,5. This is 1 'real' and 3 'fake' cores.


  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........91, 92, 89, 72 C
  • % usage of cores...86, 86, 0, 82, 0, 82, 0, 0 %
  • fps..........................319

Next is core1,3,5,7. This is 4 'fake' cores.

  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........91, 98, 98, 94 C
  • % usage of cores...0, 83, 0, 83, 0, 82, 0, 82 %
  • fps..........................368.2

Screw the 5,6, and 7 threads (that's another 12! tests), this is taking too long. Extrapolate yourself, but know that all the cores on my system will run at a maximum of 3.2 gHz and will peg the temperatures. Due to the apparent ability of the Core architecture to use a 'fake' core as mostly a real core, as shown above, I don't think there's much point in it.

The final is core0-8. This is the whole chip being used.

  • turbo speed............3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 gHz
  • temperatures..........91, 97, 98, 92 C
  • % usage of cores...79, 77, 78, 75, 73, 78, 80, 72 %
  • fps..........................382.7

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...