BTC FPGA Miner Challenge -- Best Hashrate, Lowest power per hash

  • Stare: Pending
  • Premiu: $840
  • Intrări primite: 0

Panoul public de clarificare

  • TotallyLost
    Titularul concursului
    • acum 11 luni

    "Do or do not. There is no try" said Yoda's character ... the foundation for success in all walks of life.

    67 entries, none have posted the REQUIRED open source github link as their contest graphic, with at least basic required functionality, which was required as a minimum for entry.

    I'm really disappointed that $840 is not enough for someone to port and optimize the existing FPGA mining software for the XC7Z010, 10M08, and GW1NR-9 devices. A task that should be achievable in less than a week's work, or a couple days for a skilled college student with reconfigurable computing skills.

    Yet one person offered to hire someone to do the work, if I awarded them the contest. My reply: You are welcome to submit your open source github solution to the contest, and will be awarded the prize if it's the best.

    Suggestions?

    • acum 11 luni
    1. TotallyLost
      Titularul concursului
      • acum 11 luni

      Dear John Lawrence B,

      I hope this message finds you well. I am writing to offer you a solution for your contest. I would like to propose an alternative approach.

      I am confident that I can help you achieve your goals. If you award me the prize money for the contest, I will take it upon myself to find a qualified professional who can fulfill your requirements. I will handle the project management and ensure that your needs are met without you incurring any additional costs. I will also take care of publishing the project according to my own strategy, which I believe will yield the desired results.

      As a gesture of my commitment, I am willing to take on the responsibility of finding the right freelancer and overseeing the project at no cost to you. I am confident in my ability to deliver the outcome you are seeking.

      Please consider my proposal, and I am happy to discuss any further details or answer any questions you may have. Thank you for your consideration.

      Sincerely,

      • acum 11 luni
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Step 1: Start by cloning https://github.com/fpgaminer/Open-Source-FPGA-Bitcoin-Miner and building the three FPGA target devices, using an updated gateway and mining proxy. Completed github with sources, prebuilt RPI/fpga images, 3 board wiring diagram, with required testing report in github ReadMe page. Post this github link as your contest entry graphic. Entries without this minimum will be rejected (3-20hrs)

    Step 2: Incorporate the round folding found in nalex87/Verilog-SHA256-1/blob/master/main.v (2-8hrs)

    Step 3: Use floor planner on all three devices to optimize for best case hash rate, at the lowest power. (10+hrs)

    Step 4: Apply additional improvements to each target device. LUT packing, worst case delay mgmt.

    Step 5: Update public github sources, install images working/tested on all three platforms, and ReadMe project report for peer review. Use RPI or Petalinux proxy. Make sure your contest entry graphic displays your github link.

    Step 6: Peer review rank top 3 entries.

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Your successful submission requires 3 implementations using Altera, GOWIN, and Xilinx student/hobby boards:

    Tang nano 9K board with GOWIN GW1NR-9 FPGA (about $15 from Sipeed on AliExpress or eBay)
    Intel Dev Kit EK10M08E144 with Altera 10M08 FPGA (about $52 from Mouser or eBay)
    Xilinx XC7Z7010 Development Board (about $22 from Shengzhi on AliExpress)
    Raspberry Pi Pico W for Mining controller (About $5-10 from AliExpress or eBay)

    These may use a heat sink and fan for best hash rate.

    Raspberry Pi Pico is the controller setting up the FPGA work, and handles the wifi communication for managing Solo or Pool work assignments. Or omit the RPI Pico, and use the EBAZ4205 PS section as the cluster controller with Ethernet (EBAZ4250 wired to Gowin and Altera)..

    BTC mining lotto randomly gives away $100,000+ every 10 minutes. More boards, better odds.

    $22 XC7Z010 board: https://www.aliexpress.us/item/3256803866335473.html
    or Digilent Arty Z7-10 with Xilinx XC7Z010 (about $200 from Digilent)

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Ok ... extended contest time again, and added more money to the prize :)

    FYI: resources for the ebaz4205 board with XC7Z010

    https://theokelo.co.ke/getting-starting-with-ebaz4205-zynq-7000/
    http://cholla.mmto.org/ebaz4205/
    https://github.com/trebisky/ebaz4205_miner

    In one of the first comments for this project, I opened the discussion about collapsing rounds to remove registers, and bring more combinatorial logic into the rounds. I had done this in a similar sha256 project back in 2012 ... here is another SHA256 designer that did a similar design in 2017:

    https://github.com/nalex87/Verilog-SHA256-1/blob/master/main.v

    My 2012 design included reordering operations, and lut packing, to optimize the worst case delay path. Moved the WK[] = W[] + K[] operation into the expander, from the compressor. Plus a few other optimizations.

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    For those not familar with implementing SHA256, these are the function/macros in the expander and compressor.

    #define ROTL(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
    #define ROTR(x, n) (((x) >> (n)) | ((x) << (32 - (n))))

    #define Ch(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
    #define Maj(x, y, z) (((x) & ((y) | (z))) | ((y) & (z)))
    #define SIGMA0(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22))
    #define SIGMA1(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25))
    #define sigma0(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ ((x) >> 3))
    #define sigma1(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ ((x) >> 10))

    A good, tight, fast, low power design can implement these as manually placed IP blocks created in the floor planner, and call the IP blocks out in the Verilog rather than use word level operator synthesis in Verilog.

    Likewise each pipeline round can be reduced to an IP block that is manually placed using the floor planner. This will minimize routing length delays/power.

    Good P&R is exponential, NP Hard

    • acum 1 an
    1. AbhishekEG
      AbhishekEG
      • acum 1 an

      Each pipeline round can also be reduced to an IP block that is manually placed using the floor planner, which can help to further optimize the design. However, the process of optimizing the design using place-and-route (P&R) tools is complex and computationally intensive, and finding a good, tight, fast, and low-power design is an exponential problem that is known to be NP-hard.

      • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an


    Outline the report here, plus a few replies due to character limits:

    [centered]
    Team Report: [your github/repro name]
    BTC FPGA Miner Challenge
    Best Hashrate, Lowest power per hash

    [left]
    Summary

    Tang Nano 9K hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH
    Altera 10M08 hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH
    Xilinx XC7Z010 hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH

    Tang Nano 9K stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached
    Altera 10M08 stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached
    Xilinx XC7Z010 stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached

    Tang Nano 9K dynamic power: 0.0A at 0.0Volts is 0.0Watts
    Altera 10M08 dynamic power: 0.0A at 0.0Volts is 0.0Watts
    Xilinx XC7Z010 dynamic power: 0.0A at 0.0Volts is 0.0Watts

    • acum 1 an
    1. TotallyLost
      Titularul concursului
      • acum 1 an

      Report summary continues with:

      Tang Nano 9K static power: 0.0A at 0.0Volts is 0.0Watts
      Altera 10M08 static power: 0.0A at 0.0Volts is 0.0Watts
      Xilinx XC7Z010 static power: 0.0A at 0.0Volts is 0.0Watts


      Solo Mining tested with node: [Node name and IP address] with average rate of 0.0 MH/sec
      Pool Mining tested with pool: [Pool name and IP address] with average rate of 0.0 MH/sec

      We have verified that each device meets all setup and hold times at idle, with operation inside vendor specified worst case operating conditions for our designs. Solo and Pool Mining results are the sum of one each Tang, Altera, and Xilinx device operating concurrently from the RPI Pico W mining controller(s).

      Team Lead: [Your name]
      Team Members: [Team member list]

      • acum 1 an
    2. TotallyLost
      Titularul concursului
      • acum 1 an

      main section of report will include at minimum:

      Project Design Methodology

      [describe over all architecture for your design common to all devices]

      [describe the methods used to improve performance on the Tang Nano 9K device]

      [describe the methods used to improve performance on the Altera 10M08 device]

      [describe the methods used to improve performance on the Xilinx XC7Z010 device]

      • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Googling for variations Bitcoin fpga Mining gets a lot of hits. Some are pretty cool, as this has been a fun project for a lot of engineers over the years.

    http://www.cs.columbia.edu/~sedwards/classes/2014/4840/reports/Half-fast.pdf
    http://www.cs.columbia.edu/~sedwards/classes/2014/4840/reports/Half-fast.tar.gz

    On GitHub this is another gem .... kramble/DE0-Nano-BitCoin-Miner

    And a lot more

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    This diagram is a good conceptualization tool, which highlights the 7 input adder that is central to the SHA265 compression rounds: https://commons.wikimedia.org/wiki/File:SHA-2.svg#mediaviewer/File:SHA-2.svg

    Implementing this function:
    b[i+1].a = (b[i].h + SIGMA1(b[i].e) + Ch(b[i].e, b[i].f, b[i].g) + b[i].K + b[i].W) + (SIGMA0(b[i].a) + Maj(b[i].a, b[i].b, b[i].c));

    N input adders are an interesting topic, that many people ignore.

    https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8843959

    https://www.epfl.ch/labs/lap/wp-content/uploads/2018/05/ParandehAfsharJan08_EfficientSynthesisOfCompressorTreesOnFpgas_ASPDAC08.pdf

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    With Crypto projects, using lots of small FPGA's for high performance reconfigurable computing, is a simple solution to impossible power and thermal management when using large FPGA's.

    Early (and even some current) FPGA designs target traditional control and glue logic designs where only 5-15% of the logic is "actively switching". Dense Crypto algorithms have an average toggle rate of roughly 50% of gates, simply because the Crypto algorithms are attempting to fully randomize bits ... coin toss per gate to switch, or not switch ... to be a zero or a one ... probability of retaining previous value about 50% of the time, and toggling about 50% of the time.

    Toggling consumes power to charge parasitic capacitance in gates and routing, or to shunt that charge to the ground rail. Asic miner chips face the same problem ... they are all relatively small chips, and a miner uses a lot of them, to distribute the heat across many chips.

    Fast, power efficient is critical for usable mining rigs.

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Sipeed has offered to refund a Tang Nano 9K board for contestants that complete the contest. A small but generous sponsor offer for this contest projects developers.

    Hello,
    Thank you for your support for our Tang product!
    We can return the Tang FPGA board fees for developers who have successfully submit your mining contest.

    吴才泽 / Caesar Wu
    深圳矽速科技有限公司 Shenzhen Sipeed Tech Ltd

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    There are more than a dozen area vs time design solutions, allowing a mix of different implementations to best fill a device to capacity for the best maxium hash rate. So lets explore architectures that may allow using idle resources in an FPGA, and/or better packing.

    1) Using block ram as memories, with a simple ALU design.

    2) Using LUT ram as memories, possibly dual port, with a simple ALU design.

    3) Using block ram as sequencers, including main counters.

    4) Using LUT ROMs as sequencers to compact control logic.

    5) Factoring control logic, K memories, to feed and control multiple 'slimer' expander/compressor's.

    Each of these different architectural approaches for the problem can be mixed and matched, giving multiple very different solutions, for even the smallest devices. With more than a dozen block memories, combining them with a LUT based ALU and sequencer, provides valuable hashers. Simple linear programming problem for optimal ratios.

    Multiple team members help here.

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    AND ... there are other significant optimization strategies NOT provided ... use your skills. Put a good team together, pool your formal training, interests, expertise and experience. WIN YOUR bragging rights, and EARN your TOP job in the FPGA reconfigurable computing accelerator industry!

    The skills learned and demonstrated in this project are extremely valuable when applied to real world algorithm implementation for FPGA accelerated data centers. This project should become a good resume builder, as reconfigurable computing emerges from research to production. And other POW algorithms for block chain can best be served with FPGA reconfigurable computing, since expensive ASIC implementations are not very flexible. BTC is just one of many to be easily FPGA implemented.

    I'll take donations from other entities, vendors, and mentors to increase the prize for this contest. I'm semi-retired, and the nearly $600 for this contest with fees, is the limit of my personal budget.

    Suggestions?

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    In C:
    struct buf {
    unsigned int K,W,a,b,c,d,e,f,g,h;
    } b[65];

    with rounds looking like:

    b[i+1].a = (b[i+0].h + SIGMA1(b[i+0].e) + Ch(b[i+0].e, b[i+0].f, b[i+0].g) + b[i+0].K + b[i+0].W) +
    (SIGMA0(b[i+0].a) + Maj(b[i+0].a, b[i+0].b, b[i+0].c));
    b[i+1].b = b[i+0].a;
    b[i+1].c = b[i+0].b;
    b[i+1].d = b[i+0].c;
    b[i+1].e = b[i+0].d + (b[i+0].h + SIGMA1(b[i+0].e) + Ch(b[i+0].e, b[i+0].f, b[i+0].g) + b[i+0].K + b[i+0].W);
    b[i+1].f = b[i+0].e;
    b[i+1].g = b[i+0].f;
    b[i+1].h = b[i+0].g;

    In verilog, remove the registers in 3 or 7 rounds, and let the combinatorials cascade. This reduces the number of pipeline stages and registers by 75%/87.5%, lowering foot print and dynamic power. The combinatorial path is now longer, doing more work per clock with a lower percentage of routing, setup, hold delays. Higher hash rate, even with a slower clock.

    • acum 1 an
  • TotallyLost
    Titularul concursului
    • acum 1 an

    Big gains for a 10 year old, widely studied and used algorithm. The CME design reduced the Goldstrike 1 LUT count from 49,145 to 46,013 and the register count from 54,674 to 52,428 (95.5%, 4.5% net gain).

    Compacting 8 rounds into 1, shrinks the compressor by about 28,672 registers, so we now have a target implementation size of 52,428-28,672=23,756 registers (45.3% of CME, and 43.5% of Goldstrike1, 56.5% net gain). There is a smaller additional gain from also compacting the expander in the same 8:1 shrink.

    This is a 56.5%/4.5% = 12.5x improvement over CME's effort to reduce register count.

    LUT counts are not likely to be quite as substantial, but with switching to 7-3 compressors it should be significant, as it opens the door for packing additional logic besides the adders into LUT's. Hand packing functions should have a substantial effect on area, routing length, power, delays, and clock speed.

    • acum 1 an

Cum să începi concursurile

  • Postează-ţi concursul

    Postează-ți concursul Rapid și ușor

  • Obține o mulțime de intrări

    Obține o mulțime de intrări Din întreaga lume

  • Premiază cea mai bună intrare

    Premiază cea mai bună intrare Descarcă ușor fișierele

Postează un concurs chiar acum sau înregistrează-te astăzi!