Developing FPGA-accelerated cloud applications with SDAccel: Theory

Developing FPGA-accelerated cloud applications with SDAccel: Theory

Name: Developing FPGA-accelerated cloud applications with SDAccel: Theory
Rating: 4.740259740259741 (77 reviews)

Instructor: Marco Domenico Santambrogio

4,161 already enrolled

Included with

Learn more

6 modules

Gain insight into a topic and learn the fundamentals.

77 reviews

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

6 modules

Gain insight into a topic and learn the fundamentals.

77 reviews

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

The theory on how to develop FPGA-accelerated applications with SDAccel.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

7 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 6 modules in this course

This course is for anyone passionate in learning how to develop FPGA-accelerated applications with SDAccel!

We are entering in an era in which technology progress induces paradigm shifts in computing! As a tradeoff between the two extreme characteristics of GPP and ASIC, we can find a new concept, a new idea of computing... the reconfigurable computing, which has combined the advantages of both the previous worlds. Within this context, we can say that reconfigurable computing will widely, pervasively, and gradually impact human lives. Hence, it is time that we focus on how reconfigurable computing and reconfigurable system design techniques are to be utilised for building applications. One one hand reconfigurable computing can have better performance with respect to a software implementation but paying this in terms of time to implement. On the other hand a reconfigurable device can be used to design a system without requiring the same design time and complexity compared to a full custom solution but being beaten in terms of performance. Within this context, the Xilinx SDx tools, including the SDAccel environment, the SDSoC environment, and Vivado HLS, provide an out-of-the-box experience for system programmers looking to partition elements of a software application to run in an FPGA-based hardware element, and having that hardware work seamlessly with the rest of the application running in a processor or embedded processor. The out-of-the-box experience will provide interesting and, let us say, “good enough” results for many applications. However, this may not be true for you, you may be looking for better performance, data throughput, reduced latency, or to reduce the resources usage... This course is focusing exactly on this. After introducing you to the FPGAs we are going to dig more into the details on how to use Xilinx SDAccel providing you also with working examples on how to optimize the hardware logic to obtain the best of of your hardware implementations. In this case, certain attributes, directives, or pragmas, can be used to direct the compilation and synthesis of the hardware kernel, or to optimise the function of the data mover operating between the processor and the hardware logic. Furthermore, In this course we are going to focus on distributed, heterogeneous infrastructures, presenting how to bring your solutions to life by using the Amazon EC2 F1 instances.

From the mid-1980s, reconfigurable computing has become a popular field due to the FPGA technology progress. An FPGA is a semiconductor device containing programmable logic components and programmable interconnects but no instruction fetch at run time, that is, FPGAs do not have a program counter. In most FPGAs, the logic components can be programmed to duplicate the functionality of basic logic gates or functional Intellectual Properties (IPs). FPGAs also include memory elements composed of simple flip-flops or more complex blocks of memories. Hence, FPGA has made possible the dynamic execution and configuration of both hardware and software on a single chip. This module provides a detailed description of FPGA technologies starting from a general description down to the discussion on the low-level configuration details of these devices, to the bitstream composition and the description of the configuration registers.

What's included

9 videos2 assignments

9 videos Total 57 minutes

Reconfigurable Computing and FPGA technologies 5 minutes
FPGA-based systems and reconfiguration 4 minutes
Programmable System-on-Multiple Chips 8 minutes
Programmable System-on-Chips 4 minutes
FPGAs main building blocks 7 minutes
How to program an FPGA: bitstream and configuration 6 minutes
How to program an FPGA: system description and physical design 8 minutes
CAD Tools for FPGA-based systems design 6 minutes
An introuction to the SDx development environment 9 minutes

2 assignments Total 70 minutes

QUIZ 1 40 minutes
QUIZ 2 30 minutes

The Xilinx SDAccel Development Environment let the user express kernels in OpenCL C, C++ and RTL (as an example we can think of, SystemVerilog, Verilog or VHDL) to run on Xilinx programmable platforms. The programmable platform is composed of (1) the SDAccel Xilinx Open Code Compiler (XOCC), (2) a Device Support Archive (DSA) which describes the hardware platform, (3) a software platform, (4) an accelerator board, and5. last but not least, the SDAccel OpenCL runtime. Within this module, after an introduction to OpenCL, we are going to see how this language has been sued in SDAccel and the main "components" of this toolchain.

What's included

7 videos1 reading1 assignment

7 videos Total 37 minutes

Hardware Design Flow 6 minutes
An introduction to SDAccel and the OpenCL-based flow 6 minutes
OpenCL computational model: global and local sizes 4 minutes
Not only OpenCL! The Rationale behind the RTL and C flows 5 minutes
SDAccel memory model 5 minutes
SDAccel "emulations" 5 minutes
SDAccel runtime 5 minutes

1 reading Total 120 minutes

SDAccel Environment Programmers Guide 120 minutes

1 assignment Total 30 minutes

QUIZ 3 30 minutes

Within this module, Before getting into the optimisation, we will first understand how an FPGA is working, also from a computational point of view. Although the traditional FPGA design flow is more similar to a regular IC than a processor, an FPGA provides significant cost advantages in comparison to an IC development effort and offers the same level of performance in most cases. Another advantage of the FPGA when compared to the IC is its ability to be dynamically reconfigured. This process, which is the same as loading a program in a processor, can affect part or all of the resources available in the FPGA fabric. When compared with processor architectures, the structures that comprise the FPGA fabric enable a high degree of parallelism in application execution. The custom processing architecture generated by SDAccel for an OpenCL kernel presents a different execution paradigm. This must be taken into account when deciding to port an application from a processor to an FPGA. To better understand such a scenario we will briefly compare a processor sequential execution with the intrinsic parallel nature of an FPGA implementation. Furthermore, within this module we are going to familiarise ourselves with the application optimisation flow.The Xilinx SDAccel Environment is a complete Software Development Environment, for creating, compiling, and optimising OpenCL applications with the objective of being accelerated on Xilinx FPGAs. From a designer perspective we can organise the flow for optimising an application in the SDAccel Environment as a three phases flow. Those three phases are: (1) baselining functionalities and performance, (2) optimising data movement and (3) optimising kernel computation

What's included

5 videos1 reading1 assignment

5 videos Total 37 minutes

Introduction 6 minutes
FPGA Parallelism vs Processor Architecture 1/2 7 minutes
FPGA Parallelism vs Processor Architecture 2/2 8 minutes
Scheduling, Pipelining, and Dataflow 8 minutes
Application Optimization Flow 7 minutes

1 reading Total 90 minutes

SDAccel Environment Profiling and Optimisation Guide 90 minutes

1 assignment Total 30 minutes

QUIZ 4 30 minutes

In this module we will provide a bird's eye view on the available SDAccel optimisations. The presented optimisations are not the only available ones, but they are more a list of recommendations to optimise the performance of an OpenCL application that have to be used as a starting point for ideas to consider or investigate further. Within this context we will organise these “recommendations” in three sets of optimisations: (1) arithmetic optimisations, (2) data-related optimisations, and finally (3) memory-related optimisations.

What's included

6 videos2 readings1 assignment

6 videos Total 34 minutes

A bird's eye view on SDAccel optimizations 9 minutes
Interface optimizations: Overall context and an overview of a typical target architecture 6 minutes
Interface optimizations: a first example 6 minutes
Burst data transfer 4 minutes
Using full AXI data width 5 minutes
Using multiple memory banks 3 minutes

2 readings Total 210 minutes

SDAccel Environment Profiling and Optimisation Guide 120 minutes
Sources Codes 90 minutes

1 assignment Total 30 minutes

QUIZ 5 30 minutes

After an overall description of possibile optimisations, within this module we will focus on four specific optimisations (1) loop unrolling, (2) loop pipelining, (3) array partitioning and (4) the host optimisations. First, we will describe loop unrolling which means to unroll the loop iterations so that, the number of iterations of the loop reduces, and the loop body performs extra computation. This technique allows to expose additional instruction level parallelism that Vivado HLS can exploit to implement the final hardware design. After that we will present the loop pipelining optimisation, where we will move from a sequential execution of the loop iterations to a pipelined execution in which the loop iterations are overlapped in time. After that we will present the array partitioning optimisation which allows to optimise the usage of BRAM resources in order to improve the performance of the kernel. Finally, at the end of this module we are going to discuss optimisations related to the host system that is responsible for transferring the data to and from the FPGA board, as well as to send the command to start the execution of a kernel.

What's included

6 videos2 readings1 assignment

6 videos Total 43 minutes

Kernel optimization: loop unrolling 1/2 6 minutes
Kernel optimization: loop unrolling 2/2 6 minutes
Kernel optimization: loop pipelining 10 minutes
Kernel optimization: array partitioning 1/2 8 minutes
Kernel optimization: array partitioning 2/2 7 minutes
Host optimizations 6 minutes

2 readings Total 180 minutes

SDAccel Environment Profiling and Optimisation Guide 90 minutes
Source Codes 90 minutes

1 assignment Total 30 minutes

QUIZ 6 30 minutes

What's included

3 videos1 reading1 assignment

Instructor

Instructor ratings

(16 ratings)

Marco Domenico Santambrogio

Politecnico di Milano

5 Courses 24,205 learners

Offered by

Politecnico di Milano

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
76.62%
4 stars
22.07%
3 stars
0%
2 stars
1.29%
1 star
0%

Showing 3 of 77

Reviewed on Jan 16, 2020

A very nice introduction course to give you a detailed look at how FPGA can be used to accelerate software applications.

Reviewed on Jun 20, 2020

It is a good course to know the basic of Xilinx sdaccel with a bit more inclination towards the history of the development of FPGA.

Reviewed on Jul 21, 2019

Industry standards are met, a good course to start from basic

View more reviews

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.