• 一师一团土地确权登记颁证工作全面展开 2019-04-14
  • 德州扑克赌场披“俱乐部”外衣 打竞技旗号难掩赌博实质 2019-04-12
  • 自治区党委召开常委(扩大)会议 陈全国主持 2019-04-12
  • 17年来首次!塔利班组织宣布停火3天 与阿富汗民众自拍 2019-04-04
  • 2022年冬奥会筹备进行时 2019-04-03
  • 人家80年前就造航母,我们现在才造航母,基础不一样。 2019-04-03
  • 葡萄牙首都上演城市节狂欢 2019-04-01
  • RED EARTH红地球展现自我丝绒唇膏全新发布 2019-03-24
  • 龙船礼 有讲究 百岁龙 抖精神 2019-03-17
  • 新加坡航空将开通 全球最长商业航线 2019-03-17
  • 传说中的自由飞“翔” 当厕所被狂风吹上天 2019-03-12
  • 导游强迫交易获刑 曾辱骂威胁强迫游客消费上万元--旅游频道 2019-03-09
  • 北京正式推出租赁型职工集体宿舍 每间居住人数不超8人 2019-03-09
  • 美元短线拉升 随后回吐涨幅 2019-03-07
  • 朔州市人大常委会任免名单 2019-03-05
  • Welcome to

    Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

     

    Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

    Reply
     
    Thread Tools Search this Thread Display Modes
    Old 11th November 2018, 12:25   #1221  |  Link
    Registered User
     
    Selur's Avatar
     
    Join Date: Oct 2001
    Location: Germany
    Posts: 5,804
    Quote:
    I wish aomenc/vpxenc had GOP-level parallelism.
    Which would require 2pass encoding and a fixed gop structue (in regard to the gop sizes), iirc 2nd pass normally should be able to overwrite GOP to archive vbv limits (not totally sure).
    __________________
    Hybrid here in the forum, homepage
    Selur is offline   Reply With Quote
    Old 11th November 2018, 12:49   #1222  |  Link
    Registered User
     
    Join Date: May 2014
    Posts: 164
    ffmpeg -hide_banner -t 10 -c:v libaom-av1 -i 1.mp4 -benchmark -f null - (43 fps)
    ffmpeg -hide_banner -t 10 -c:v libdav1d -i 1.mp4 -benchmark -f null - (52 fps)
    ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 1 -tilethreads 2 -i 1.mp4 -benchmark -f null - (61 fps)
    ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 2 -tilethreads 2 -i 1.mp4 -benchmark -f null - (65 fps)
    Gravitator is offline   Reply With Quote
    Old 11th November 2018, 13:08   #1223  |  Link
    Registered User
     
    Join Date: Aug 2015
    Posts: 100
    Quote:
    Originally Posted by v0lt View Post
    sse2, sse4.1?
    It seems that one of dav1d developers said: "we don't care about mmx/sse2 support anyway" (link). Have no idea about sse4.1.
    lvqcl is online now   Reply With Quote
    Old 11th November 2018, 14:12   #1224  |  Link
    I am maddo saientisto!
     
    SmilingWolf's Avatar
     
    Join Date: Aug 2018
    Posts: 77
    Quote:
    Originally Posted by lvqcl View Post
    It seems that one of dav1d developers said: "we don't care about mmx/sse2 support anyway" (link). Have no idea about sse4.1.
    BBB is part of TwoOrioles, so it might have been referred to the company based on its userbase.
    Still, MMX is hardly relevant nowadays. SSE4.1 as the lowest bar doesn't sound too unreasonable

    Also relevant: https://code.videolan.org/videolan/d.../15#note_22262

    Last edited by SmilingWolf; 11th November 2018 at 17:34.
    SmilingWolf is offline   Reply With Quote
    Old 11th November 2018, 18:06   #1225  |  Link
    Registered User
     
    Join Date: Aug 2010
    Location: Athens, Greece
    Posts: 2,506
    Quote:
    Originally Posted by lvqcl View Post
    It seems that one of dav1d developers said: "we don't care about mmx/sse2 support anyway"
    Have no idea about sse4.1.
    Quote:
    Originally Posted by SmilingWolf View Post
    Still, MMX is hardly relevant nowadays. SSE4.1 as the lowest bar doesn't sound too unreasonable.
    MMX is too old and not that beneficial as it can reach only 64bits (maybe 80bits max)
    SSEx should be the base as it is 128bit with very fast implementation on all CPUs of the last 10 years.
    Especially SSE2 is mandatory for x64 architecture.
    From the last link it's obvious that dav1d developers targeted AVX2 for 256bit acceleration using ASM, but not exclusively.
    They are going to optimise for SSEx later.
    So no worries, I think.
    __________________
    Win 10 x64 (17763.379) - Core i3-4170/ iGPU HD 4400 (v.5058)
    HEVC decoding benchmarks
    H.264 DXVA Benchmarks for all
    NikosD is offline   Reply With Quote
    Old 11th November 2018, 19:50   #1226  |  Link
    Registered User
     
    Join Date: Jul 2018
    Posts: 45
    if they want to go with 4k and 8k videos they have to use AVX2.
    __________________
    AV1 win64 VS2017 builds
    Last build here | History
    I also open source the build scripts at Github: here
    marcomsousa is offline   Reply With Quote
    Old 11th November 2018, 23:30   #1227  |  Link
    Registered User
     
    Nintendo Maniac 64's Avatar
     
    Join Date: Nov 2009
    Location: Northeast Ohio
    Posts: 384
    Quote:
    Originally Posted by NikosD View Post
    Especially SSE2 is mandatory for x64 architecture.
    You can also usually safely target SSE3 (no, not SSSE3) as well since it's supported on all DDR2-capable 64bit x86 CPUs and newer.

    (the only 64bit x86 CPUs that don't support SSE3 are some socket 754 and 939 Athlon 64s which used DDR1)
    Nintendo Maniac 64 is offline   Reply With Quote
    Old 12th November 2018, 04:22   #1228  |  Link
    Beyond Kawaii
     
    Mystery Keeper's Avatar
     
    Join Date: Feb 2008
    Location: Russia
    Posts: 700
    Quote:
    Originally Posted by Selur View Post
    Which would require 2pass encoding and a fixed gop structue (in regard to the gop sizes), iirc 2nd pass normally should be able to overwrite GOP to archive vbv limits (not totally sure).
    I'm totally fine with that. I usually use 2pass anyway. And, of course, I meant I wish they had it as an option.
    __________________
    ...desu!
    Mystery Keeper is offline   Reply With Quote
    Old 12th November 2018, 11:47   #1229  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,812
    @ Nintendo Maniac 64:

    Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3; but x264/x265 does not use it, considers their implementation as "too slow", I believe.
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old 12th November 2018, 22:16   #1230  |  Link
    Registered User
     
    Join Date: Jul 2018
    Posts: 45
    SSE3-optimised av1_nn_predict


    https://aomedia.googlesource.com/aom...6f313f27b1c501

    Quote:
    I have developed a SIMD-optimised neural network implementation using
    SSE3. I have also added functional equivalence tests between this and
    the original implementation. I added aom_clear_system_state() to a few
    places where FPU operations are used after av1_nn_predict.

    Speed-ups over the original C implementation for various network shapes:
    10x64x16: 1.72x
    12x12x1: 2.72x
    12x24x1: 2.35x
    12x32x1: 3.34x
    18x24x4: 0.94x
    18x32x4: 0.93x
    4x16x1: 2.01x
    8x16x1: 1.89x
    8x16x4: 2.02x
    8x24x1: 2.77x
    8x32x1: 2.98x
    8x64x1: 3.76x
    9x32x3: 1.08x
    4x8x4: 1.66x

    A few awkwardly-shaped networks are slightly slower: these could be
    padded to more convenient sizes to use the SIMD kernels.

    I also wrote an AVX/AVX2 implementation but on these relatively small
    networks it was barely faster than the SSE3 code.
    __________________
    AV1 win64 VS2017 builds
    Last build here | History
    I also open source the build scripts at Github: here
    marcomsousa is offline   Reply With Quote
    Old 12th November 2018, 22:23   #1231  |  Link
    Registered User
     
    Nintendo Maniac 64's Avatar
     
    Join Date: Nov 2009
    Location: Northeast Ohio
    Posts: 384
    Quote:
    Originally Posted by LigH View Post
    Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3
    ...but this is exactly what I alluded to?

    Athlon 64 CPUs are available on socket 754, 939, and AM2; 754 and 939 used DDR1 memory while AM2 used DDR2, and all AM2 CPUs support SSE3.

    (there are some socket 754 and 939 CPUs that support SSE3, though it's kind of hit and miss).

    Phenom for reference requires at least DDR2.
    Nintendo Maniac 64 is offline   Reply With Quote
    Old 13th November 2018, 07:50   #1232  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,812
    I'm sorry, I don't know socket numbers... - so we looked at the same topic from different angles.
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old 13th November 2018, 18:26   #1233  |  Link
    Registered User
     
    Join Date: Dec 2008
    Posts: 1,073
    Quote:
    Originally Posted by Wolfberry View Post
    I ran the same test as above and get 16/38/46 fps.
    What is the CPU you use for testing?
    It might be related to the AVX2 code used in dav1d.
    Intel i5-3570k (SSE4.1, SSE4.2, AVX), Windows 7 Sp1 x64.
    __________________
    MPC-BE 1.5.3 (build 4488) stable (SF.net)
    v0lt is offline   Reply With Quote
    Old 14th November 2018, 23:29   #1234  |  Link
    I am maddo saientisto!
     
    SmilingWolf's Avatar
     
    Join Date: Aug 2018
    Posts: 77
    Status report!
    Previous edition: //www.zs-x.com/showthread.ph...49#post1852449
    Whatever paragraph I don't repeat here can be assumed to be the same as in the aforementioned post

    First of all: graphs! Click to enlarge
    Y axis: chosen metric
    X axis: bits per pixel

    720p:


    1080p:


    BD rates for 720p:
    Code:
    x264 -> rav1e (yeah you read that right!)
            RATE (%)  DSNR (dB)
     MSSSIM -0.736889 0.0375593
    PSNRHVS -5.5274   0.375081
    
    rav1e -> x265
            RATE (%) DSNR (dB)
     MSSSIM -26.5291 1.29942
    PSNRHVS -27.1134 1.70509
    
    x265 -> libaom
            RATE (%) DSNR (dB)
     MSSSIM -18.9088 0.7852
    PSNRHVS -15.3123 0.761791
    BD rates for 1080p:
    Code:
    x264 -> rav1e (yeah you read that right again!)
            RATE (%) DSNR (dB)
     MSSSIM -4.92009 0.235151
    PSNRHVS -7.23088 0.473125
    
    rav1e -> x265
            RATE (%) DSNR (dB)
     MSSSIM -26.7063 1.16103
    PSNRHVS -28.0007 1.53902
    
    x265 -> libaom
            RATE (%) DSNR (dB)
     MSSSIM -26.486  0.938124
    PSNRHVS -21.7431 0.905916
    Encoders:
    x264 157-2935-545de2f
    x265 2.9-4-471726d3a046
    rav1e 0.1.0-702-ab4d23e2
    libaom 1.0.0-908-g3a607f7b0

    Cmdlines:
    x264 --preset veryslow --tune ssim --crf 16 -o test.x264.crf16.264 orig.i420.y4m
    x265 --preset veryslow --tune ssim --crf 16 -o test.x265.crf16.hevc orig.i420.y4m
    rav1e --low_latency false -o test.rav1e.cq80.ivf --quantizer 80 -s 2 --tune psnr orig.i420.y4m
    aomenc --frame-parallel=0 --tile-columns=3 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --threads=2 --end-usage=q --cq-level=20 --test-decode=fatal -o test.av1.cq20.webm orig.i420.y4m

    Notes:
    So as you can see, the rav1e and aomenc cmdlines have been slightly adjusted to take advantage of the bugfixes and updates from the last months.
    In particular, rav1e has been gifted by Frank Bossen the ability to create a B-pyramid, which almost single handedly decreed rav1e's advantage over x264.
    A word of warning on this last point: it's still kind of a mixed bag. In very flat, static scenes like PresageFlowerWalk x264 still rules by quite a margin, while rav1e takes the crown in clips like F.Y.C and PresageFlowerFight
    Code:
    F.Y.C, x264 -> rav1e:
            RATE (%) DSNR (dB)
     MSSSIM -18.451  1.01281
    PSNRHVS -25.7463 2.03419
    
    PresageFlowerFight, x264 -> rav1e:
            RATE (%) DSNR (dB)
     MSSSIM -31.4953 1.80761
    PSNRHVS -31.0827 2.27546
    
    PresageFlowerWalk, x264 -> rav1e:
            RATE (%) DSNR (dB)
     MSSSIM 66.2264 -1.70084
    PSNRHVS 70.8208 -2.28853
    (as always, a negative BD rate means improvement, positive means regression)

    Considerations about times with libaom:
    I'm using my desktop PC to run all the encodes. It is also my main study/work PC, so the times can come quite off. Plus, I run multiple encodes in parallel, which further messes up timings.
    HOWEVER, between annoying bugs and a lot of stuff, the first report did cost me nearly a week of time (this includes having to re-run some encodes because sh*t happened) ONLY to encode with libaom.
    Taking advantage of the recent bugfixes and improvements I have been able to rework my workflow and bring down that time to a couple days only, WITHOUT having to touch the --cpu-used parameter and no night time encoding.
    All in all, I am pretty satisfied.

    This concludes my (bi-monthly?) report.
    As always, I'm open to any kind of feedback to improve my comparisons and my encodes.

    Last edited by SmilingWolf; 14th November 2018 at 23:34.
    SmilingWolf is offline   Reply With Quote
    Old 16th November 2018, 18:53   #1235  |  Link
    Moderator
     
    Join Date: Jan 2006
    Location: Portland, OR
    Posts: 2,810
    So, what's everyone's favorite AV1 decoder app on Windows? Chrome looks to be not converting from video to PC range correctly (blacks are washed out, contrast is low, etcetera). Is there a nightly of something that does AV! correctly for apples-apples?
    __________________
    Ben Waggoner
    Principal Video Specialist, Amazon Prime Video

    My Compression Book
    benwaggoner is offline   Reply With Quote
    Old 16th November 2018, 21:45   #1236  |  Link
    I am maddo saientisto!
     
    SmilingWolf's Avatar
     
    Join Date: Aug 2018
    Posts: 77
    Quote:
    Originally Posted by benwaggoner View Post
    So, what's everyone's favorite AV1 decoder app on Windows? Chrome looks to be not converting from video to PC range correctly (blacks are washed out, contrast is low, etcetera). Is there a nightly of something that does AV! correctly for apples-apples?
    VLC 3.0.5 (Nightly). I fixed my nVidia settings just today because I had that same problem while playing back the ToS fragment I use for the tests. Plays out correctly now.
    In alternative, ffplay for quick stuff when I already have a bunch of command prompts open in the right path.

    Last edited by SmilingWolf; 17th November 2018 at 11:30.
    SmilingWolf is offline   Reply With Quote
    Old 16th November 2018, 22:27   #1237  |  Link
    German doom9/Gleitz SuMo
     
    LigH's Avatar
     
    Join Date: Oct 2001
    Location: Germany, rural Altmark
    Posts: 5,812
    I use almost only MPC-HC. Which uses LAV Filters with a direct API. It was able to play AV1 clips from the YouTube beta playlist and some tiny own encodes (I don't have powerful CPU's available). So, only a limited experience, yet, but it appears to work.
    __________________

    New German Gleitz board
    MediaFire: x264 | x265 | VPx | AOM | Xvid
    LigH is offline   Reply With Quote
    Old 18th November 2018, 11:40   #1238  |  Link
    I am maddo saientisto!
     
    SmilingWolf's Avatar
     
    Join Date: Aug 2018
    Posts: 77
    32/64bits binaries (GCC 9.0):
    av1-1.0.0-941-gd2a592e1c: https://mega.nz/#!F5Am2KyK!9aQ6_7mM2...6_OaZahvKCHPWQ
    SmilingWolf is offline   Reply With Quote
    Old 19th November 2018, 09:51   #1239  |  Link
    Registered User
     
    mandarinka's Avatar
     
    Join Date: Jan 2007
    Posts: 714
    Quote:
    Originally Posted by Mystery Keeper View Post
    I wish aomenc/vpxenc had GOP-level parallelism. When each thread is encoding one GOP, and then they are stitched together. That would make use of all CPU power without compromising quality/compression.
    You could get the same results by splitting manually into X parts end encode them separately at once. I'm not sure how much does libvpx/libaom count with that. It works great with x264 and x265 (using raw output at least).
    mandarinka is offline   Reply With Quote
    Old 19th November 2018, 09:58   #1240  |  Link
    Registered User
     
    mandarinka's Avatar
     
    Join Date: Jan 2007
    Posts: 714
    Quote:
    Originally Posted by LigH View Post
    @ Nintendo Maniac 64:

    Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3; but x264/x265 does not use it, considers their implementation as "too slow", I believe.
    SSE3 is not particularly useful for multimedia and it's just a few instructions introduced in Presscot P4 and Venice 90nm K8.

    You probably mean SSSE3 (SSS instead of SS) aka "Suplemental SSE3" which is a confusing and dumb name. Probably should have been SSE4 but got renamed for marketing reasons. Or SSE3 was not supposed to be SSE3 originally.

    SSSE3 is very useful for encoding and decoding, but only comes on Core 2 chips, and Bobcat/Bulldozer and later cores from AMD. K10 and K8 end at the not-so-important SSE3.
    (Note that x265 actually needs SSSE3 + SSE4 to be useful, you are barred from most of assembly optimization if you only have SSSE3, like with 65nm Core 2s or pre-Sandy Bridge Pentium/Celeron).

    Last edited by mandarinka; 19th November 2018 at 10:01.
    mandarinka is offline   Reply With Quote
    Reply


    Posting Rules
    You may not post new threads
    You may not post replies
    You may not post attachments
    You may not edit your posts

    BB code is On
    Smilies are On
    [IMG] code is On
    HTML code is Off

    Forum Jump


    All times are GMT +1. The time now is 22:01.


    Powered by vBulletin® Version 3.8.11
    Copyright ©2000 - 2019, vBulletin Solutions Inc.
  • 一师一团土地确权登记颁证工作全面展开 2019-04-14
  • 德州扑克赌场披“俱乐部”外衣 打竞技旗号难掩赌博实质 2019-04-12
  • 自治区党委召开常委(扩大)会议 陈全国主持 2019-04-12
  • 17年来首次!塔利班组织宣布停火3天 与阿富汗民众自拍 2019-04-04
  • 2022年冬奥会筹备进行时 2019-04-03
  • 人家80年前就造航母,我们现在才造航母,基础不一样。 2019-04-03
  • 葡萄牙首都上演城市节狂欢 2019-04-01
  • RED EARTH红地球展现自我丝绒唇膏全新发布 2019-03-24
  • 龙船礼 有讲究 百岁龙 抖精神 2019-03-17
  • 新加坡航空将开通 全球最长商业航线 2019-03-17
  • 传说中的自由飞“翔” 当厕所被狂风吹上天 2019-03-12
  • 导游强迫交易获刑 曾辱骂威胁强迫游客消费上万元--旅游频道 2019-03-09
  • 北京正式推出租赁型职工集体宿舍 每间居住人数不超8人 2019-03-09
  • 美元短线拉升 随后回吐涨幅 2019-03-07
  • 朔州市人大常委会任免名单 2019-03-05
  • 直立刮刮乐拖把 北京pk10高手赌法 北京28是怎么坑人的 大乐透个位分布图 排列5几点开奖 今日体彩p3字谜 幸运飞艇是晚上几点结束 老时时彩遗漏 2017O46期福彩中奖 老时时彩开奖结果记录 浙江飞鱼管业 黑龙江时时彩代理 印尼雅加达五分彩官网 黑彩票平台抓住判几年 快中彩开奖号码 中国足彩网500