# Support of Cross Calls between Microprocessor and FPGA in CPU-FPGA Coupling Architecture



### G. NguyenThiHuong and Seon Wook Kim

Microarchitecture and Compiler Laboratory School of Electrical Engineering Korea University



Compiler & Microarchitecture Lab



# Motivation

```
void process (struct data* head)
struct data* p;
int ret = 0:
for( p = head; p; p = p - next){
p->content = (struct elem*) calloc (p->size);
if( !p->content ){
ret = 1:
break:
} else{
return ret;
struct data* head:
int main (void) {
error = process (head);
. . . . .
```



Many code sections are executed more efficiently in microprocessor: floating intensive codes, system calls, memory management functions in FPGA, the FPGA should be able to call back to microprocessor as a master component.



### **Previous work**

- Away from code coordination between CPU and FPGA
  - Handel-C, Impulse C
  - OCPIP, AMBA
- Support nested and recursive only in hardware side
  - ASH (M. Budiu ASPLOS '04), HybridThreads (E. Anderson-ERSA '07)
  - $\,\circ\,$  Do not allow hardware to call software
- Allows hardware to return back to software for software code execution
  - <u>Comrade (H. Lange-FPL '07)</u>

Do not support to support the crossicallisits in FPGA between SW and HW without any limitation!



# GCC2Verilog approach

• GCC2Verilog: A C-to-Verilog translator based on GCC compiler

- Including a Verilog backend to generate Verilog code from GCC's RTL
- Making hardware follows software calling convention
  - Software and hardware share one stack space.
    - Arguments passing through argument registers and stack.
  - Preserve software stack layout when performing calls in hardware side.
- Supporting:
  - Unlimited nesting calls in hardware including recursive calls.
  - Unlimited nesting cross calls between software and hardware.

Any hardware function in FPGA can be a master in the system!



## Contents

- Compilation and Execution Model
- Address Resolution
- Additional Components
- Cross Calling Convention
- Experiment Results
- Conclusion



### **GCC2Verilog: Compilation & Execution Model**



- Code partitioning process:
  - Divides codes into hardware and software sections
  - Prepares the address resolution
- Compilation process:
  - Compiles software code section into executable objects
  - Translates hardware code section into Verilog code and synthesizes them to HW bitstreams (HWIPs).
- Execution process:
  - Running SW executable code in a microprocessor & HWIPs in FPGA
  - The FPGA communicates with the host processor through a communication channel and memory.



## Address Resolution

- Hardware address resolution:
  - Assigning an hardware identification number hwid to each HWIP
- Software address resolution:
  - Static link: use the symbol table obtained an executable file to resolve software addresses at HLL-to-HDL translation.
  - Dynamic link:
    - Assign an identification number *swid* to each SW callee called from HW
    - Use an address\_resolver() to obtain SW callee address at run time from swid



UNIVERSITY

# **Additional Components**

#### • HW controller:

 Controls and schedules the execution between a processor and HWIPs

#### • SW/HW interface:

- Provides a uniform interface to communicate with the host processor
- HW register set: set of registers for calls:
  - Argument registers
  - o HW stack pointer
  - o Link register





# Software Calls Hardware

1. The wrapper function passes arguments, and calls the HW callee

2. HW controller enables the HW callee reads its arguments, and starts to



KORFA

UNIVERSITY

### Hardware Callee Returns to Software Caller

4. HW controller interrupts the host processor when the HW callee finishes5. The interrupt handler notifies the HW finishing to the wrapper





## Hardware Calls Software

1. HW caller passes arguments and notifies to the controller about the call 3. The interrupt handler resolves the SW callee's actual address from 2. HW controller interrupts the processor with SW callee ID swid & the wrapper calls the function.





## Hardware Calls Software

4. SW callee executes its code & returns to the wrapper when finish





### Software Callee Returns to Hardware caller

5. The wrapper notifies to HW controller about SW finish6. The HW caller is enabled again to continue its execution





### Hardware Calls Hardware





### Hardware Calls Hardware



UNIVERSITY

# **Experiment Result**

- Experiment setup
  - Host processor: ARM922T
  - Benchmarks: EEMBC + factorial (recursion)
- Calling overhead:
  - Cross calls between SW and HW (exclude interrupting time)
    - Static link: 99 cycles
    - Dynamic link: 125 cycles
  - o Calls among HWIPs:
    - Less than 5 cycles



# **Experiment Result**

| Benchmarks | Number of calls | Call overhead (%) |
|------------|-----------------|-------------------|
| aifftr     | 300             | 3.52              |
| aiifft     | 300             | 4.00              |
| fft        | 100             | 2.71              |
| bezier     | 20              | 0.11              |
| idctrn     | 600             | 4.62              |
| rgbyiq     | 10              | 0.02              |
| viterb     | 200             | 8.37              |
| autcor     | 100             | 0.05              |
| factorial  | 10              | 19.91             |

### Call overhead including interrupt time



# Conclusion

- Novel method to fully support cross calls among microprocessor and FPGA
  - $\circ$  Allowing FPGA to perform calls back to a microprocessor
  - $\circ\,$  Supporting unlimited nested and recursive calls in FPGA
- Reasonable cross calling overhead
- An importance step toward the full automatic translation of HLL to HDL
- Implemented a C-to-Verilog translator based on GCC compiler



### **Questions & Answers**

