Skip Ribbon Commands
Skip to main content

COE

:

News: COE Public Seminar - Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication - Dr. Ahmad Khayyat

Title

COE Public Seminar - Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication - Dr. Ahmad Khayyat

Body

College of Computer Science and Engineering
 
 
Computer Engineering Department
 
Presents Public Seminar
 
 
Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication
 
 
Date: Wednesday, Nov 23, 2016
Time:  02:30 pm – 03:30 pm
Location: Building 22, Room 119
 
Speaker:
Dr. Ahmad Khayyat
Asst. Professor, COE Department
 
Abstract: This talk describes the design of an efficient and flexible implementation of parallel, floating-point matrix multiplication for FPGA devices. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources.
The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated both analytically and experimentally.
 
A VHDL implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation and on FPGA hardware. Experimental results show that the design's performance scales linearly with respect to the consumed resources. For instance, with 8 floating-point arithmetic units, the system computes 4 GFLOPS, whereas with 64 arithmetic units, it performs 16 GFLOPS. It is also shown that using a transfer schedule based on inner products reduces the transfer time by up to 50% compared to other schedules. Although using square blocks minimizes the number of required block multiplications, other non-square blocks minimize the transfer time, resulting in better total times.
 
Biography: Dr. Ahmad Khayyat is an assistant professor in the Computer Engineering Department at King Fahd University of Petroleum and Minerals (KFUPM).  Ahmad received his PhD and MScE (Master of Science in Engineering) degrees from Queen’s University, Canada, in 2013 and 2007, respectively.  He received his BS degree from KFUPM in 2002 with first honors. Ahmad is interested in FPGA-based accelerator architectures, high-level hardware description, and verifiable embedded systems. Ahmad is a member of the Institute of Electrical and Electronics Engineers (IEEE), and the IEEE Computer Society.
 
 
 
All faculty, researchers and graduate students are invited to attend.
 
 
Computer Engineering Department, College of Computer Sciences and Engineering
Telephone:+966 (13) 860 2110, Email: c-coe@kfupm.edu.sa, Website: www.kfupm.edu.sa/departments/coe/
Copyright © 2014 King Fahd University of Petroleum & Minerals
 
 

Expires

 

DisplayItem

1

Attachments

Created at 11/23/2016 8:11 AM by Webmaster of CCSE website CCSE
Last modified at 11/23/2016 8:14 AM by Webmaster of CCSE website CCSE