• “创新从来都是九死一生”(人民论坛) 2019-02-14
  • 端午假期广州铁路运客640.5万人次 创历史新高 2019-02-14
  • 19次生态输水让塔河下游生机勃勃 2018-11-22
  • 男篮再胜伊朗迎热身赛两连胜 任骏飞19+11陶汉林18分 2018-11-22
  • 小卒子,你南街村的代言人啊?扮豬不咋像呢!你滴,大大滴,明白? 2018-11-22
  • 女性之声——全国妇联 2018-11-21
  • 新华网评:凝聚打赢脱贫攻坚战的强大合力 2018-11-21
  • 栗战书:执法检查要直面问题不搞评功摆好 让法律制度成为不可触碰的高压线 2018-11-21
  • 这些水果越新鲜越不能吃 放一放更好吃 2018-11-21
  • 生产资料公有制不会也不可能涉及生产资料的分配,这完全是你杜撰的,是强词夺理的。从这点看,你的所谓逻辑是幼稚可笑的。哈哈哈哈! 2018-11-20
  • 践行“两山论”是一场发展的革命 2018-11-20
  • 女教师舍身保护学生被撞身亡感动各界 2018-11-20
  • Welcome to

    Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

     

    Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

    Reply
     
    Thread Tools Search this Thread Display Modes
    Old 20th February 2019, 10:03   #6741  |  Link
    Registered User
     
    Selur's Avatar
     
    Join Date: Oct 2001
    Location: Germany
    Posts: 5,765
    Nope, already answered that he simply uses media-autobuild suite (see://www.zs-x.com/showthread.p...45#post1866145), so no profiling.
    __________________
    Hybrid here in the forum, homepage
    Selur is offline   Reply With Quote
    Old 20th February 2019, 10:31   #6742  |  Link
    Pig on the wing
     
    Boulder's Avatar
     
    Join Date: Mar 2002
    Location: Hollola, Finland
    Posts: 4,497
    Some build log could be useful, maybe there is something missing. I'd expect that if assembler was not used, the difference would be much bigger though.
    __________________
    And if the band you're in starts playing different tunes
    I'll see you on the dark side of the Moon...
    Boulder is offline   Reply With Quote
    Old 20th February 2019, 10:44   #6743  |  Link
    Helenium(Easter)
     
    Wolfberry's Avatar
     
    Join Date: Aug 2017
    Location: Hsinchu, Taiwan
    Posts: 65
    media-suite_compile.sh#L1346

    Assembly is explicitly turned off in 32bit builds, the default CFLAGS in the suite is "-mthreads -mtune=generic -O2 -pipe"

    The suite uses GCC 7.4.0 for 32bit builds
    __________________
    RAC#1-6

    Last edited by Wolfberry; 20th February 2019 at 10:48.
    Wolfberry is offline   Reply With Quote
    Old 20th February 2019, 15:35   #6744  |  Link
    Helenium(Easter)
     
    Wolfberry's Avatar
     
    Join Date: Aug 2017
    Location: Hsinchu, Taiwan
    Posts: 65
    x265-v3.0_Au+7-cb3e172a5f51 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit]

    Redistributable Libraries for Intel?C++

    Supply --svt in the command line to use the SVT-HEVC encoder.
    __________________
    RAC#1-6

    Last edited by Wolfberry; 20th February 2019 at 15:42.
    Wolfberry is offline   Reply With Quote
    Old 20th February 2019, 21:24   #6745  |  Link
    Registered User
     
    Join Date: Sep 2018
    Posts: 10
    i tried hard with GCC again.

    seconds. lower is better.

    Code:
    137.0 no assembly
     47.0 (default)
     45.5 (PGO build) -mtune=ivybridge (default here is -O3 which makes 1st pass PGO .exe crash, thus no better speed i guess)
     44.5 (PGO build) -mtune=ivybridge -O2
     43.9 (PGO build) -mtune=ivybridge -funroll-loops -finline-functions -ftree-loop-vectorize -O2
     39.5 LigH
    so i get little improvement with all that fiddling, but still far away from LigH's GCC builds.


    giving up here, i have no ideas left.

    Last edited by poller; 20th February 2019 at 21:48.
    poller is offline   Reply With Quote
    Old 21st February 2019, 00:14   #6746  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,763
    OK, I forgot little details I edited a long time ago, while testing some compiling issues with a faulty compiler version. A leftover string is:

    export CXXFLAGS="-march=pentium4 -mtune=generic"

    for the 32-bit compilation (which is still quite generic, just a sensible minimum). That might bring a little advantage. For the 64-bit compilation, the CXXFLAGS is empty.

    Furthermore, for the 32-bit compilation, assembly is disabled for 10 and 12 bit precision cores, but enabled for the 8 bit core.
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid

    Last edited by LigH; 21st February 2019 at 00:17.
    LigH is offline   Reply With Quote
    Old 21st February 2019, 10:39   #6747  |  Link
    Registered User
     
    Join Date: Aug 2016
    Posts: 59
    Quote:
    Originally Posted by Wolfberry View Post
    Supply --svt in the command line to use the SVT-HEVC encoder.
    Never expected that one!

    From //x265.org/x265-svt-hevc-house/:

    Quote:
    With changeset a41325fc854f, the x265 library can invoke the SVT-HEVC library for encoding through the 梥vt option. We have mapped presets and command-line options supported by the x265 application into the equivalent options of SVT-HEVC, and have added a few specific options that are available only when the SVT-HEVC library is invoked. This page in our documentation describes the steps to build, and invoke the SVT-HEVC library in more detail.

    Our reason for this integration was to enable our users to evaluate additional relative trade-offs between performance and compression efficiency while working behind the familiar API of the x265 library. In the long term, we plan to leverage this integration to further improve x265抯 ability to handle real-time and low turn-around scenarios in pure software; this is the space that SVT-HEVC was focused on. In parallel, we will continue to innovate on our flagship presets that are used in offline encoding where x265 dominates. You can expect to see these changes in the coming releases of x265, increasing the reach of open-source for video compression!
    Am I being cynical to suggest that Multicoreware couldn't achieve such speed optimisations on their own, so they formed this "synergy"?
    WhatZit is offline   Reply With Quote
    Old 21st February 2019, 10:43   #6748  |  Link
    Registered Developer
     
    Join Date: Mar 2010
    Location: Hamburg/Germany
    Posts: 9,543
    Personally I think its stupid to incorporate another encoder into the x265 "frontend". If one wanted to use different encoders, one would use say ffmpeg, or just use them directly. x265 should be x265, and nothing else. But oh well. Probably some business driving over common sense.
    __________________
    LAV Filters - open source ffmpeg based media splitter and decoders
    nevcairiel is offline   Reply With Quote
    Old 21st February 2019, 11:27   #6749  |  Link
    Registered User
     
    Join Date: Feb 2012
    Posts: 46
    Lol, good to know I'm not the only one who thinks x265's decision to include another encoder inside itself is stupid. Well, if there's money involved here, I'm not even surprised
    shinchiro is offline   Reply With Quote
    Old 21st February 2019, 21:32   #6750  |  Link
    Moderator
     
    Join Date: Jan 2006
    Location: Portland, OR
    Posts: 2,778
    Quote:
    Originally Posted by shinchiro View Post
    Lol, good to know I'm not the only one who thinks x265's decision to include another encoder inside itself is stupid. Well, if there's money involved here, I'm not even surprised
    From the link, it sounds like the big plan is to start incorporating use of certain SVT-HEVC features/tools within x265. Having a highly accelerated coarse motion search mode could help. Kind of like the OpenGL/CUDA experiments with x264 a while ago.

    x265 has a TON of features where it can take input from a first pass and then refine it. Some of those don't require the stream be made with x265, and a few work with H.264 sources IIRC.
    __________________
    Ben Waggoner
    Principal Video Specialist, Amazon Prime Video

    My Compression Book
    benwaggoner is offline   Reply With Quote
    Old 22nd February 2019, 09:57   #6751  |  Link
    Herr
     
    Join Date: Apr 2009
    Location: North Europe
    Posts: 320
    I just wanted to say that I did a little x265 speed-test, one compile vs another,
    x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) vs x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit].

    I encoded a 44 second long cartoon animation, 00096.m2ts, with this setting:
    x265.exe --crf 18 --preset veryslow --output-depth 10 --rdoq-level 0 --psy-rdoq 0 --aq-mode 1 --aq-strength 0.4 --qcomp 0.65 --bframes 16 --rc-lookahead 48 --ref 6 --min-keyint 24 --keyint 240 --frame-threads 1 --colormatrix bt709 --deblock -2:-2 --no-sao --psy-rd 0.4 --tskip --tskip-fast --tu-inter 4 --tu-intra 4 --frames 1066


    x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) Duration: 00:53:41
    x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit] Duration: 00:53:32

    Not a big difference in speed, considering I have a Intel Core i5-5200U CPU (I thought that the ICC 1900-compile would be much faster).
    Forteen88 is offline   Reply With Quote
    Old 22nd February 2019, 12:13   #6752  |  Link
    Registered User
     
    Selur's Avatar
     
    Join Date: Oct 2001
    Location: Germany
    Posts: 5,765
    Quote:
    I thought that the ICC 1900-compile would be much faster
    to be frank I would have been surprised using a different compiler to have much of an impact,...
    __________________
    Hybrid here in the forum, homepage
    Selur is offline   Reply With Quote
    Old 22nd February 2019, 15:29   #6753  |  Link
    Registered User
     
    Join Date: Sep 2018
    Posts: 10
    Quote:
    Originally Posted by LigH View Post
    OK, I forgot little details I edited a long time ago, while testing some compiling issues with a faulty compiler version. A leftover string is:

    export CXXFLAGS="-march=pentium4 -mtune=generic"

    for the 32-bit compilation (which is still quite generic, just a sensible minimum). That might bring a little advantage. For the 64-bit compilation, the CXXFLAGS is empty.

    Furthermore, for the 32-bit compilation, assembly is disabled for 10 and 12 bit precision cores, but enabled for the 8 bit core.
    well, here not even -march=corei7 did help much.
    assembly needs to be disabled for x86 high bit, it does not compile when enabled.


    Quote:
    Not a big difference in speed, considering I have a Intel Core i5-5200U CPU (I thought that the ICC 1900-compile would be much faster).
    the same here, actually, all x64 builds (from the net) i tested are pretty much on the same level, my own builds included and also the ICC compile.

    but i see differences in the x86 builds. but honestly, not many people will use those anyway.

    Last edited by poller; 22nd February 2019 at 15:32.
    poller is offline   Reply With Quote
    Old 22nd February 2019, 17:51   #6754  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,763
    One more build to compare, with two variants:

    x265 3.0_Au+7-cb3e172a5f51 MABS compiled with media-autobuild_suite only (EXE only, no DLL)

    x265 3.0_Au+7-cb3e172a5f51 compiled with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old 22nd February 2019, 22:05   #6755  |  Link
    Registered User
     
    Join Date: Sep 2018
    Posts: 10
    nice, some small test:

    x265_3.0_RC+14-46b84ff665fd
    20.5 seconds
    Code:
    cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00
    x265_3.0_Au+7-cb3e172a5f51
    20.5 seconds
    Code:
    cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00
    x265_3.0_Au+7-cb3e172a5f51_MABS
    23.3 seconds
    Code:
    cpuid=1049583 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00
    my own build
    22.6 seconds
    Code:
    cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00

    the MABS build has some additional setting (numa-pools=8) but that did not affect the performance.

    this was tested on a i7-3770k
    poller is offline   Reply With Quote
    Old 22nd February 2019, 23:51   #6756  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,763
    What you may not find here are default GNU C/C++ compiler options.

    Please note that MABS scripts may set up some specific CFLAGS and CXXFLAGS (e.g. O2 or O3?). The interactive MinGW consoles should not ... so GCC / G++ defaults may apply. Except for the 32-bit build where I explicitly set CXXFLAGS with pretty generic options suitable for 32-bit code on any AMD64 capable CPU, minimally (see above).

    I have no clue what I may do "right".
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old Yesterday, 00:32   #6757  |  Link
    Moderator
     
    Join Date: Jan 2006
    Location: Portland, OR
    Posts: 2,778
    Quote:
    Originally Posted by Selur View Post
    to be frank I would have been surprised using a different compiler to have much of an impact,...
    It seems we've seen compilers make about a 10% difference from slowest to fastest. Which is kinda surprising to me given all the hand-tuned assembly that doesn't get compiled.
    __________________
    Ben Waggoner
    Principal Video Specialist, Amazon Prime Video

    My Compression Book
    benwaggoner is offline   Reply With Quote
    Old Yesterday, 00:38   #6758  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,763
    With this amount, the only reason I could imagine is memory alignment...
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old Yesterday, 01:17   #6759  |  Link
    Broadcast Encoder
     
    FranceBB's Avatar
     
    Join Date: Nov 2013
    Location: Germany
    Posts: 486
    Since everyone was concerned about x64 platforms and nobody used x86, I tested it on a real x86 platform running Windows Server 2003 x86 with PAE and 16 GB of RAM.
    The CPU is an old, dusty Intel Xeon 4c/8th running at 2.60GHz with instruction sets up to SSE4.2:

    4/N.A) - x265 3.0_Au+7 - MABS compiled by LigH with media-autobuild_suite only (EXE only, no DLL)

    It didn't even start. It refused to start due to missing kernel calls: GetNumaNodeProcessorMaskEx, InitializeConditionVariable, SetThreadGroupAffinity, SleepConditionVariableCS, WakeAllConditionVariable
    No luck on Windows Server 2003, so it won't run on XP and its derivatives either.

    3) - x265 3.0_Au+7 - compiled by LigH with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells

    3.7fps/3.9fps

    2) - x265 3.0_Au+7 - compiled with GCC9 (Preview) target SSE4.2

    4.2fps/4.3fps

    1) - x265 3.0_Au+7 - compiled with GCC8 target SSE4.2

    4.7fps/4.8fps

    Very basic low-complex Command line:
    x265.exe --y4m - --dither --preset medium --level 5.0 --tune fastdecode --no-high-tier --ref 2 --rc-lookahead 3 -b 2 --profile main --bitrate 25000 --deblock -4:-4 --min-luma 64 --max-luma 940 --chromaloc 2 --range limited --videoformat component --colorprim bt709 --transfer bt709 --colormatrix bt709 --overscan show --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3 --vbv-maxrate 25000 --vbv-bufsize 25000 --asm=sse4.2 --wpp -o "\\VBOXSVR\Share_Windows_Linux\raw_video.hevc"

    Lossless 16bit SD (UHD SDR downscaled) footage.


    Anyway, I don't think the comparison is fair, 'cause LigH targeted pentium4, which means only SSE2 are supported.
    In other words, I'm comparing SSE4.2 vs SSE2 and it's pretty clear that SSE4.2 have an advantage over SSE2.
    As to GCC9, it seems that they changed something in the way -mtune behaves or maybe they changed something else; anyway, it produces an SSE4.2 build slower than the GCC8 SSE4.2 one.
    It would be interesting to find out how ICC targeting SSE4.2 behaves on old Intel x86 systems (if Intel Parallel Studio can produce a Windows Server 2003 compatible binary).
    FranceBB is offline   Reply With Quote
    Reply


    Posting Rules
    You may not post new threads
    You may not post replies
    You may not post attachments
    You may not edit your posts

    BB code is On
    Smilies are On
    [IMG] code is On
    HTML code is Off

    Forum Jump


    All times are GMT +1. The time now is 06:08.


    Powered by vBulletin® Version 3.8.11
    Copyright ©2000 - 2019, vBulletin Solutions Inc.
  • “创新从来都是九死一生”(人民论坛) 2019-02-14
  • 端午假期广州铁路运客640.5万人次 创历史新高 2019-02-14
  • 19次生态输水让塔河下游生机勃勃 2018-11-22
  • 男篮再胜伊朗迎热身赛两连胜 任骏飞19+11陶汉林18分 2018-11-22
  • 小卒子,你南街村的代言人啊?扮豬不咋像呢!你滴,大大滴,明白? 2018-11-22
  • 女性之声——全国妇联 2018-11-21
  • 新华网评:凝聚打赢脱贫攻坚战的强大合力 2018-11-21
  • 栗战书:执法检查要直面问题不搞评功摆好 让法律制度成为不可触碰的高压线 2018-11-21
  • 这些水果越新鲜越不能吃 放一放更好吃 2018-11-21
  • 生产资料公有制不会也不可能涉及生产资料的分配,这完全是你杜撰的,是强词夺理的。从这点看,你的所谓逻辑是幼稚可笑的。哈哈哈哈! 2018-11-20
  • 践行“两山论”是一场发展的革命 2018-11-20
  • 女教师舍身保护学生被撞身亡感动各界 2018-11-20