• “创新从来都是九死一生”(人民论坛) 2019-02-14
  • 端午假期广州铁路运客640.5万人次 创历史新高 2019-02-14
  • 19次生态输水让塔河下游生机勃勃 2018-11-22
  • 男篮再胜伊朗迎热身赛两连胜 任骏飞19+11陶汉林18分 2018-11-22
  • 小卒子,你南街村的代言人啊?扮豬不咋像呢!你滴,大大滴,明白? 2018-11-22
  • 女性之声——全国妇联 2018-11-21
  • 新华网评:凝聚打赢脱贫攻坚战的强大合力 2018-11-21
  • 栗战书:执法检查要直面问题不搞评功摆好 让法律制度成为不可触碰的高压线 2018-11-21
  • 这些水果越新鲜越不能吃 放一放更好吃 2018-11-21
  • 生产资料公有制不会也不可能涉及生产资料的分配,这完全是你杜撰的,是强词夺理的。从这点看,你的所谓逻辑是幼稚可笑的。哈哈哈哈! 2018-11-20
  • 践行“两山论”是一场发展的革命 2018-11-20
  • 女教师舍身保护学生被撞身亡感动各界 2018-11-20
  • Welcome to

    Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

     

    Go Back   Doom9's Forum > Announcements and Chat > General Discussion

    Reply
     
    Thread Tools Search this Thread Display Modes
    Old 9th November 2018, 09:05   #21  |  Link
    Registered User
     
    Join Date: Mar 2003
    Location: Germany
    Posts: 58
    Ok so all builds apart of simd256 work with fft3dfilter. However the old build simd128+256 appears to have been a wee bit faster than all new builds with dfttest.

    Out of curiosity: what do you actually mean by simd 128 or 256 in contrast to sse2/avx/avx2? After all sse2 is a 128 simd and avx/avx2 have additional 256 simd instructions on top of sse2.

    Last edited by ErazorTT; 9th November 2018 at 09:13.
    ErazorTT is offline   Reply With Quote
    Old 9th November 2018, 13:08   #22  |  Link
    Helenium(Easter)
     
    Wolfberry's Avatar
     
    Join Date: Aug 2017
    Location: Hsinchu, Taiwan
    Posts: 82
    Code:
      --enable-sse2             enable SSE/SSE2 optimizations
      --enable-avx              enable AVX optimizations
      --enable-avx2             enable AVX2 optimizations
      --enable-avx512           enable AVX512 optimizations
      --enable-avx-128-fma      enable AVX128/FMA optimizations
      --enable-kcvi             enable Knights Corner vector instructions optimizations
      --enable-altivec          enable Altivec optimizations
      --enable-vsx              enable IBM VSX optimizations
      --enable-neon             enable ARM NEON optimizations
      --enable-generic-simd128  enable generic (gcc) 128-bit SIMD optimizations
      --enable-generic-simd256  enable generic (gcc) 256-bit SIMD optimizations
    Above is some flags that you can use during configuration.
    The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
    AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.

    The fftw release note says:
    Quote:
    enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain
    And the more path you enabled, the more fat the dlls will be.
    Quote:
    Originally Posted by HolyWu View Post
    I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.
    The future builds will have these codelets generated as well.
    __________________
    media-autobuild_suite builds / FFTW

    Last edited by Wolfberry; 9th November 2018 at 13:12.
    Wolfberry is offline   Reply With Quote
    Old 9th November 2018, 18:46   #23  |  Link
    Registered User
     
    Join Date: Mar 2003
    Location: Germany
    Posts: 58
    So the generic options are based on the compiler vectorization and optimization.

    Have you tried to increase the alignment using --with-incoming-stack-boundary? Like suggested here: //www.zs-x.com/showthread.p...80#post1857180
    ErazorTT is offline   Reply With Quote
    Old 10th November 2018, 01:10   #24  |  Link
    Helenium(Easter)
     
    Wolfberry's Avatar
     
    Join Date: Aug 2017
    Location: Hsinchu, Taiwan
    Posts: 82
    Quote:
    Originally Posted by Groucho2004 View Post
    There are a number of guidelines here about building fftw.
    The official guideline for building fftw on windows is outdated, I consider BUILD-MINGW32 and BUILD-MINGW64 and PKGBUILD as a better reference.

    Quote:
    On win32, some versions of gcc assume that the stack is 16-byte aligned, but code compiled with other compilers may only guarantee a 4-byte alignment, resulting in mysterious segfaults.
    As quoted, --with-incoming-stack-boundary=2 is only applicable to win32(x86), not x64.
    Code:
    configure:15780: checking whether C compiler accepts -mincoming-stack-boundary=2
    configure:15795: gcc -c -mincoming-stack-boundary=2  conftest.c >&5
    cc1.exe: error: -mincoming-stack-boundary=2 is not between 3 and 12
    __________________
    media-autobuild_suite builds / FFTW
    Wolfberry is offline   Reply With Quote
    Reply

    Tags
    fftw, fftw3.dll


    Posting Rules
    You may not post new threads
    You may not post replies
    You may not post attachments
    You may not edit your posts

    BB code is On
    Smilies are On
    [IMG] code is On
    HTML code is Off

    Forum Jump


    All times are GMT +1. The time now is 22:25.


    Powered by vBulletin® Version 3.8.11
    Copyright ©2000 - 2018, vBulletin Solutions Inc.
  • “创新从来都是九死一生”(人民论坛) 2019-02-14
  • 端午假期广州铁路运客640.5万人次 创历史新高 2019-02-14
  • 19次生态输水让塔河下游生机勃勃 2018-11-22
  • 男篮再胜伊朗迎热身赛两连胜 任骏飞19+11陶汉林18分 2018-11-22
  • 小卒子,你南街村的代言人啊?扮豬不咋像呢!你滴,大大滴,明白? 2018-11-22
  • 女性之声——全国妇联 2018-11-21
  • 新华网评:凝聚打赢脱贫攻坚战的强大合力 2018-11-21
  • 栗战书:执法检查要直面问题不搞评功摆好 让法律制度成为不可触碰的高压线 2018-11-21
  • 这些水果越新鲜越不能吃 放一放更好吃 2018-11-21
  • 生产资料公有制不会也不可能涉及生产资料的分配,这完全是你杜撰的,是强词夺理的。从这点看,你的所谓逻辑是幼稚可笑的。哈哈哈哈! 2018-11-20
  • 践行“两山论”是一场发展的革命 2018-11-20
  • 女教师舍身保护学生被撞身亡感动各界 2018-11-20