seeking example pipeline from side by side (SBS) MP4 to mutliview MV-HEVC MOV

Issue #985 open
Matthew Arnison created an issue

I am attempting to use FFMPEG and x265 to encode an MV-HEVC video and I have some questions.

  1. Is there a sample command line for using FFMPEG to encode the input for x265 then packaging into QuickTime for playback on iOS and AVP?
  2. Is there a sample video that is known to work as input to x265 for MV-HEVC encoding? (I tried to open your test harness repo from your developer wiki but got a “not found” message.)
  3. Are there any special restrictions on the input encoding that x265 will accept? I have already tried both rawvideo YUV and Y4M without success.

I initially started by following this blog:

https://medium.com/@handshoe/how-to-encode-mv-hevc-video-with-ffmpeg-c4498ce5b9ec

I have also read the doc for the x265 CLI at:

https://x265.readthedocs.io/en/master/cli.html

I am using FFMPEG 7.1.1 and x265 4.1 built with ENABLE_MULTIVIEW on Linux (Ubuntu 24.04 inside WSL2 under Windows 11).

However, I am getting stuttering frames in the output video and the output video only has 1 layer instead of 2.

Here are my command lines for my Y4M attempt:

ffmpeg -i parents_687468ad_rgb_sbs.mp4 -f yuv4mpegpipe parents_687468ad_rgb_sbs.y4m

x265 --multiview-config mv_config.cfg --fps 24 --input-res 1280x768 --output parents_687468ad_rgb_mv.hevc --profile main10 --colorprim bt709 --transfer bt709 --colormatrix bt709

And my config file:

#Configure number of views in the multiview input video#
#--num-views <integer>#
--num-views 2

#Configuration for the input format of the video#
#--format <integer>#
# 0 : Two seperate input frames#
# 1 : One input frame with left and right view#
# 2 : One input frame with top and bottom view#
--format 1

#Configure input file path for each view#
##NOTE:Other input parameters such as input-csp/input-depth/fps must be configured through CLI##
--input "parents_687468ad_rgb_sbs.y4m"

Here is the output of ffprobe on the input to x265:

Input #0, yuv4mpegpipe, from 'parents_687468ad_rgb_sbs.y4m':
  Duration: 00:00:03.88, start: 0.000000, bitrate: 566232 kb/s
  Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p(progressive), 2560x768, 24 fps, 24 tbr, 24 tbn

Here is the runtime output from x265:

$ x265 --multiview-config mv_config_parents.cfg --fps 24 --input-res 1280x768 --output parents_687468ad_rgb_mv.hevc --profile main10 --colorprim bt709 --transfer bt709 --colormatrix bt709
x265 [warning]: falling back to default bit-depth
y4m  [info]: 1280x768 fps 24000/1000 i420p8 frames 0 - 92 of 93
raw  [info]: output file: parents_687468ad_rgb_mv.hevc
x265 [info]: HEVC encoder version 4.1+110-0e0eee580
x265 [info]: build info [Linux][GCC 13.3.0][64 bit] 8bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-3.1 (Main tier)
x265 [info]: Thread pool created using 24 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 4 / wpp(12 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias  : 24 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip mode=1 signhide tmvp
x265 [info]: tools: b-intra strong-intra-smoothing lslices=4 deblock sao
x265 [info]: tools: multi-view
x265 [info]: frame I:      1, Avg QP:27.26  kb/s: 5630.78
x265 [info]: frame P:     24, Avg QP:26.99  kb/s: 2371.47
x265 [info]: frame B:     68, Avg QP:34.58  kb/s: 204.98
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%

encoded 93 frames in 1.46s (63.82 fps), 822.42 kb/s, Avg QP:32.54
x265 [info]: frame P:     25, Avg QP:27.09  kb/s: 1513.70
x265 [info]: frame B:     68, Avg QP:34.55  kb/s: 163.87
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%

encoded 93 frames in 1.46s (63.82 fps), 526.73 kb/s, Avg QP:32.55

Here is the output from ffprobe on the x265 output:

[hevc @ 0x56257da4a500] Format hevc detected only with low score of 1, misdetection possible!
Input #0, hevc, from 'parents_687468ad_rgb_mv.hevc':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: hevc (Main), yuv420p(tv, bt709), 1280x768, 25 fps, 24 tbr, 1200k tbn

And finally here is the output when converting this to a QuickTime container:

$ ffmpeg -i parents_687468ad_rgb_mv.hevc -c copy -tag:v hvc1 parents_687468ad_rgb_mv.mov
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
  configuration:
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
[hevc @ 0x55ea87df7800] Format hevc detected only with low score of 1, misdetection possible!
Input #0, hevc, from 'parents_687468ad_rgb_mv.hevc':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: hevc (Main), yuv420p(tv, bt709), 1280x768, 25 fps, 24 tbr, 1200k tbn
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
[mov @ 0x55ea87e78f80] WARNING codec timebase is very high. If duration is too long,
file may not be playable by quicktime. Specify a shorter timebase
or choose different container.
Output #0, mov, to 'parents_687468ad_rgb_mv.mov':
  Metadata:
    encoder         : Lavf61.7.100
  Stream #0:0: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 1280x768, q=2-31, 25 fps, 24 tbr, 1200k tbn
Press [q] to stop, [?] for help
[mov @ 0x55ea87e78f80] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mov @ 0x55ea87e78f80] pts has no value
    Last message repeated 92 times
[out#0/mov @ 0x55ea87f29280] video:641KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.299674%
frame=   93 fps=0.0 q=-1.0 Lsize=     643KiB time=N/A bitrate=N/A speed=N/A
$ ffprobe parents_687468ad_rgb_mv.mov
ffprobe version 7.1.1 Copyright (c) 2007-2025 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
  configuration:
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'parents_687468ad_rgb_mv.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf61.7.100
  Duration: 00:00:03.80, start: 0.000000, bitrate: 1388 kb/s
  Stream #0:0[0x1]: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 1280x768, 1355 kb/s, 24 fps, 24 tbr, 1200k tbn (default)
      Metadata:
        handler_name    : VideoHandler
        vendor_id       : FFMP

When I play this video on an iPhone, it is not recognised as a spatial video and the playback stutters badly.

Also when I run ffprobe on spatial videos that work correctly on an iPhone, this is the output of ffprobe, which includes (multilayer) on the stream line:

$ ffprobe 687468ad_b454_417a_a966_c2aea5094109_spatial.mov
ffprobe version 7.1.1 Copyright (c) 2007-2025 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
  configuration:
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '687468ad_b454_417a_a966_c2aea5094109_spatial.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    creation_time   : 2024-10-23T12:49:59.000000Z
  Duration: 00:00:03.95, start: 0.000000, bitrate: 25488 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 1280x768, 25476 kb/s, 23.57 fps, 23.58 tbr, 600 tbn (default) (multilayer)
      Metadata:
        creation_time   : 2024-10-23T12:49:59.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]
        encoder         : HEVC
      Side data:
        stereo3d: unspecified, view: packed, primary eye: left, baseline: 19240, horizontal_disparity_adjustment: 0.0200, horizontal_field_of_view: 63.400
        spherical: rectilinear
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 2 kb/s (default)
      Metadata:
        creation_time   : 2024-10-23T12:49:59.000000Z
        handler_name    : Core Media Audio
        vendor_id       : [0][0][0][0]

Comments (12)

  1. Lavanya Murugan

    Hi Matthew,
    I went through the issue you reported. And I have provided the detailed steps below.

    Step 1 Encoding using multiview :

    Build x265 with multiview on

    ./x265 --multiview-config mv_config.cfg --fps 24 --input-res 2732x768 --output vimeo-top-bottom-1440x1620yuv.hevc --profile main --colorprim bt709 --transfer bt709 --colormatrix bt709

    mv_config.cfg (left-right view)

    --num-views 2
    --format 1
    --input "vimeo-left-right-3840x1080.yuv"

    mv_config.cfg(top-bottom view)

    --num-views 2
    --format 2
    --input "vimeo-top-bottom-3840x1080.yuv"

    Step 2:

    Convert the generated hevc file to mov using mp4box.exe so that the mov file will be considered as multiview. 

    when we convert directly using ffmpeg it won't consider the view as multiview.

    below is the cli to convert.

    mp4box.exe -add "vimeo-top-bottom-1440x1620yuv.hevc" -new "vimeo-top-bottom-1440x1620yuv.mov"

    Step 3:

    Check the generated mov video using ffprobe, we can see that the mov is considered as multilayer.

    Step 4:

    To check the playback in mac machine, use spatial option in quick time player and play the video.

    Here, we used and modified mp4Box open source software to create MOV container for mv-hevc. So that we were able to display in apple system. So this is not an issue in mv-hevc feature in x265. And ffmpeg does not support to create MOV container for mv-hevc.

  2. Lavanya Murugan

    Hi Matthew,
    I have attached the sbs format and tb format internal videos which were used for MV-HEVC encoding. Kindly request for access so that I can grant access to view.
    The mp4box.exe present in the folder is windows executable file. For conversion from hevc to mov you can use mp4box in windows machine.
    MV-HEVC files

  3. Matthew Arnison reporter

    Thanks a lot for sharing your sample files. I tried to access the folder but I got an error:

    The account needs to be added as an external user in the tenant first.

  4. Matthew Arnison reporter

    Many thanks for your recipe using MP4Box. I used the latest Docker image of GPAC to run MP4Box on the output of x265 and create a MOV file.

    The resulting video is better than using ffmpeg to package the MOV file. The ffmpeg version had severe stuttering. The MP4Box version plays smoothly.

    However, when I open the MP4Box MOV on an iPhone, it is not recognised as a spatial video. This is important for our user workflow.

    It sounds like you have customised the source code of MP4Box to add the required spatial metadata.

    I found this blog which seems to address this issue so I will try this next:

    https://brilly.tv/spatial-video-guide.html

  5. Lavanya Murugan

    Great to know Matthew,
    The mp4box is customized to handle the mv-hevc.
    Kindly let us know once you try the new blog which addresses the issue.

  6. Dean Z

    Hi Matthew and Lavanya,
    I just noticed this issue and wanted to share that I’ve created a similar workflow for Windows 10/11. It uses some of the same tools but is intended for creating high resolution Immersive VR180 videos for the Apple Vision Pro (up to 8K per eye). In my use case the source videos are created from VR180 cameras such as the Canon R5 Mark ii and the Canon R7 with dual fisheye lenses and a Blackmagic Cine series camera . This requires different metadata in the vexu and hfov atoms. I think a general workflow is needed that allows the user to customize the metadata for their specific video formats. An issue I have encountered is very slow HEVC encoding using x265. I suspect this is due to the large VR180 frame sizes and the use of CPU encoding. I’m not familiar with how to optimize this with x265 or if it’s possible to leverage NVIDIA CUDA for encoding. I’ll create a separate issue about this. I would appreciate guidance on how to tune the performance. Thank you.

  7. Matthew Arnison reporter

    Hi Dean, Thanks for sharing your experience. I agree it would be helpful to be able to adjust the stereo viewing metadata (such as the vexu and hfov atoms) without needing to use a macOS specific tool like Mike Swanson’s spatial tool.

  8. Dean Z

    @Lavanya Murugan I have also been using mp4box. I think you said that you are using a modified version of it. Could I ask what you changed and why it was needed? Thank you.

  9. Log in to comment