| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |

Modelling and Optimization of Power Consumption in Reconfigurable Devices

Robin BONAMY

PhD since Oct. 2009 (ANR) CAIRN - IRISA

 Advisor
 :
 Daniel CHILLET (IRISA)

 Co-advisors
 :
 Sébastien BILAVARN and Olivier SENTIEYS (IRISA)

Journées Scientifiques MCSOC, June 30th 2011







| System on Chip | Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|----------------|---------|------------|----------------------|---------------------------------------|-----------------|
|                | Systen  | n on Chip  |                      |                                       |                 |

SoC : Chip including various functions

- Processor(s)
- Configurable area(s)
- DSP(s)
- Peripheral(s)
- Memory(s)
- Analog

TI: OMAP Xilinx: Zynq-7000 Actel: SmartFusion



| Problem | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |
| Systen  | n on Chip  |                      |                                       |                 |

- SoC are more and more used
  - Size
  - Cost

Power consumption

- Battery size/weight
- Dissipation
- Power rails design





# Need to have early power estimation for heterogeneous systems

| Problem | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |
| Outline |            |                      |                                       |                 |



- 2 Power Modelling
- 3 Hardware Accelerated Block
- 4 Reconfiguration

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |

# OpenPEOPLE: Who?

#### Consortium:

- Lab-STICC UBS
- LORIA INRIA Nancy
- Dart INRIA Lille
- LEAT UNSA
- IRISA-Cairn UR1
- THALES Coms. Colombes
- In Pixal Rennes

Project funded by the  $\overline{\mathrm{ANR}}$ 



| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |

# OpenPEOPLE: What is it?

#### Open-PEOPLE :

Open-Power and Energy Optimization PLatform and Estimator



6/34

# OpenPEOPLE: What for?

#### Complete platform to

- allow rapid power and energy estimation and measurement for complex heterogeneous systems
- test the effects of different optimizations on power consumption

CAIRN: Power consumption models for hardware tasks and reconfiguration

LEAT: Scheduler, OS aspects

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Outline |            |                      |                                       |                 |



- 2 Power Modelling
- 3 Hardware Accelerated Block
- 4 Reconfiguration



# CMOS Power Consumption [Julien06], [Garcia99], [T197]

- Dynamic Power
  - Voltage <sup>2</sup>
  - Frequency
  - Activity
  - Load (Nets Capacitance)

$$P_d = V_{cc}^2 \times F \times \alpha \times C$$

# CMOS Power Consumption [Julien06], [Garcia99], [T197]

- Dynamic Power
  - Voltage <sup>2</sup>
  - Frequency
  - Activity
  - Load (Nets Capacitance)
- Static Power (Leakage)
  - Area, Occupation Rate
  - Voltage

$$P_{d} = V_{cc}^{2} \times F \times \alpha \times C$$

$$P_s = V_{cc} \times A$$

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
|         |            |                      |                                       |                 |

# Power Modelling

- Fine Grain
  - Activity
  - Net length
  - Flip-flops
  - Operators
- Coarse Grain
  - Application parameters
  - RAM
  - Area
  - Delay

Pb: Power estimation done late

Pb: Bad power estimation accuracy

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Power   | Modelling  |                      |                                       |                 |
|         |            |                      |                                       |                 |

- Fine Grain
  - Activity
  - Net length
  - Flip-flops
  - Operators
- Coarse Grain
  - Application parameters
  - RAM
  - Area
  - Delay
- State Machine models [Benini00]
- Peak consumption [Gupta03]
- Glitches [Ragh96]

Ο ...

Pb: Power estimation done late

Pb: Bad power estimation accuracy

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block | Reconfiguration |
|---------|------------|----------------------|----------------------------|-----------------|
| Model?  |            |                      |                            |                 |



What is a model?

• P/E = f(parameters)

- How to build a model?
  - Measurements following parameters
  - Analysis, Statistics
  - Verification

| Problem  | OpenPEOPLE | Power Modelling<br>● | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|----------|------------|----------------------|---------------------------------------|-----------------|
| Platform |            |                      |                                       |                 |
| Xiliny   | MI 550 Boa | rd                   |                                       |                 |

- FPGA Virtex-5 VLX50T (7200 slices)
- SystemACE CompactFlash controller
- 5 power rails (core, IOs, peripherals)
- Current sense resistors



Previous work on Actel IGLOO.

Future work planned on Virtex 6, Altera Stratix.

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block | Reconfiguration |
|---------|------------|----------------------|----------------------------|-----------------|
|         |            |                      |                            |                 |
| Outline |            |                      |                            |                 |

## 1 Open PEOPLE

- 2 Power Modelling
- 3 Hardware Accelerated Block
- 4 Reconfiguration

13/34 CAIRN - IRISA

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>●000000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Context |            |                      |                                       |                 |
| Hardw   | are Blocks |                      |                                       |                 |



- unload processor core
- Power/Energy/Throughput efficiency [AlteraAN531]
- parallelism level

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>○●○○○○○ | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Context |            |                      |                                       |                 |
| C to V  | /HDL       |                      |                                       |                 |

Effect of parallelism level on Power consumption

- Generation of hardware blocks
- High level synthesis (C to VHDL)
- Loop unrolling
- PLB block, Microblaze at 100MHz



| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>00●0000 | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Context |            |                      |                                       |                 |
|         | _          |                      |                                       |                 |

## Measurement Protocol





- Idle power consumption
- Active power consumption
- Execution Time

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>○○○●○○○ | Reconfiguration |
|---------|------------|----------------------|---------------------------------------|-----------------|
| Task    |            |                      |                                       |                 |

# Matrix Multiplication C code

Loop Unrolling Index (LUI) notation :  $LUI_1, LUI_2, LUI_3$ 





Figure 1: Energy of the matrix multiplication versus execution time. (LUI1, LUI2, LUI3). Trend  $E = A + B \times t$ .

18/34

| Problem     | OpenPEOPLE   | Power Modelling<br>○ | Hardware Accelerated Block<br>○○○○○●○ | Reconfiguration |
|-------------|--------------|----------------------|---------------------------------------|-----------------|
| Power Measu | urement      |                      |                                       |                 |
| Two C       | ther Algorit | hms                  |                                       |                 |

Table 1:Comparison of execution time and energy between threedifferent implementations of each algorithm.

|                                                                    | Matri    | x mult.    | Full S    | bearch    | Debloc   | k. filter |
|--------------------------------------------------------------------|----------|------------|-----------|-----------|----------|-----------|
|                                                                    | Time     | Energy     | Time      | Energy    | Time     | Energy    |
| Soft (ms, mJ)                                                      | 10.75    | 244.79     | 0.4786    | 18.2      | 0.5742   | 26.09     |
| HardS (ms, mJ)                                                     | 1.04     | 61.99      | 0.0369    | 2.01      | 0.0529   | 3.68      |
| HardP (ms, mJ)                                                     | 0.38     | 27.48      | 0.0246    | 1.05      | 0.0417   | 2.74      |
| Soft/HardP Ratio                                                   | 28.29    | 8.91       | 19.37     | 17.33     | 13.77    | 9.52      |
| HardS/HardP Ratio                                                  | 2.74     | 2.26       | 1.50      | 1.91      | 1.26     | 1.34      |
| Soft represents an execution on the microblaze core,               |          |            |           |           |          |           |
| HardS represents a sequential implementation of the hardware task, |          |            |           |           |          |           |
| HardP represents the                                               | e best p | oaralleleo | d solutio | on in ter | ms of ti | me.       |

| Problem    | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>000000● | Reconfiguration |
|------------|------------|----------------------|---------------------------------------|-----------------|
| Conclusion |            |                      |                                       |                 |

Energy is not a constant

ightarrow When space is available, exploit parallelism of each task

| Problem    | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>○○○○○● | Reconfiguration |
|------------|------------|----------------------|--------------------------------------|-----------------|
| Conclusion |            |                      |                                      |                 |

Energy is not a constant

 $\rightarrow$  When space is available, exploit parallelism of each task

 $\nearrow$  parallelism level

🗡 area

↗ bitstream

 $\rightarrow$  Impact of reconfiguration ?

| Problem | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block | Reconfiguration |
|---------|------------|----------------------|----------------------------|-----------------|
|         |            |                      |                            |                 |
| Outline |            |                      |                            |                 |

## 1 Open PEOPLE

- 2 Power Modelling
- 3 Hardware Accelerated Block
- 4 Reconfiguration



| Problem      | OpenPEOPLE    | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>•0000000000000 |
|--------------|---------------|----------------------|---------------------------------------|-----------------------------------|
| Partial Reco | nfiguration   |                      |                                       |                                   |
| Dynan        | nic Partial R | econfiguration       | n                                     |                                   |



Figure : Partial Reconfiguration segmentation example

Partial reconfiguration (PR) is studied for space and energy saving. [Savary07], [Becker03] No detailed model for PR power consumption

22/34



A Microblaze soft. core and a hardware task.



| Problem       | OpenPEOPLE         | Power Modelling<br>○ | Hardware Accelerated Block | Reconfiguration<br>00●0000000000 |
|---------------|--------------------|----------------------|----------------------------|----------------------------------|
| Partial Recon | figuration Process |                      |                            |                                  |
| PR Ste        | eps                |                      |                            |                                  |

- (1) reconfiguration order arrives
- (2) open bitstream's file
- (3) read bitstream's header
- (4) check header's validity
- (5) **read** of a bitstream sector
- (6) write data to ICAP (Internal Configuration Access Port)
- (7) repeat (5) and (6) until the end of the bitstream.

| Problem     | OpenPEOPLE | Power Mode∥ing<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>000●000000000 |
|-------------|------------|---------------------|---------------------------------------|----------------------------------|
| Power Meası | ırement    |                     |                                       |                                  |
|             |            |                     |                                       |                                  |

# Power Consumption at the Beginning of PR





Robin BONAMY

CAIRN - IRISA

25/34





Figure : Partial Reconfiguration of PRR from a task to another one bottom: Core power consumption during PR

| Problem       | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>0000●00000000 |
|---------------|------------|----------------------|---------------------------------------|----------------------------------|
| Power Measure | ement      |                      |                                       |                                  |

# Power Consumption during PR



Figure : Partial Reconfiguration of PRR from a task to another one top: Hamming distance between bitstreams of Task1 and Task2 Core power consumption during PR

| Problem    | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>○○○○○●○○○○○○ |
|------------|------------|----------------------|---------------------------------------|---------------------------------|
| Power Meas | urement    |                      |                                       |                                 |
| Power      | Model from | Parameters           |                                       |                                 |

Parameters that affect energy consumption

- Memory access
- Activity of the managing core (Read and write)
- PRR Area/Bitstream size
- Difference with the previous configuration

Parameters that don't affect energy consumption

- Shape of the Partial Reconfiguration Region
- Bitstream composition

Parameters to study

- Other: memory, management core, ICAP IP, device

Trivial model (Virtex5 VLX50T):

 $E\simeq 39\mu J~per~kB$  with CF, microblaze @ 100MHz, Vcore=1V





| Problem     | OpenPEOPLE | Power Mode∥ing<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|-------------|------------|---------------------|---------------------------------------|-----------------|
| Power Measu | rement     |                     |                                       |                 |
| Future      | Work       |                     |                                       |                 |

#### Tasks





| Problem     | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|-------------|------------|----------------------|---------------------------------------|-----------------|
| Power Measu | rement     |                      |                                       |                 |
| Future      | Work       |                      |                                       |                 |

#### Tasks



Partial Reconfiguration

Power/Energy = f(bitstream, area, F, ... V, memory, device...)



| Problem     | OpenPEOPLE | Power Modelling<br>○ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|-------------|------------|----------------------|---------------------------------------|-----------------|
| Power Measu | rement     |                      |                                       |                 |
| Future      | Work       |                      |                                       |                 |

#### Tasks



Partial Reconfiguration

Power/Energy =f(bitstream, area, F, ... V, memory, device...)

29/34



Robin BONAMY

| Problem    | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|------------|------------|----------------------|---------------------------------------|-----------------|
| Power Meas | urement    |                      |                                       |                 |
| Future     | Work       |                      |                                       |                 |







Application flow graph example.

Robin BONAMY

| Problem    | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|------------|------------|----------------------|---------------------------------------|-----------------|
| Power Meas | urement    |                      |                                       |                 |
| Future     | Work       |                      |                                       |                 |



FPGA



Application flow graph example.



| Problem    | OpenPEOPLE                          | Power Modelling<br>⊙    | Hardware Accelerated Block<br>0000000                         | Reconfiguration                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|------------|-------------------------------------|-------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Power Meas | urement                             |                         |                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Biblio     | graphy                              |                         |                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|            | Julien N., Caractéri<br>ECOFAC 2006 | sation et modélisatio   | n de la consommation sur FPG                                  | А,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|            |                                     | •                       | eld Programmable Gate Arrays<br>rammable Logic and Applicatio | A State of the second sec |
|            | Texas Instruments,                  | CMOS Power Consu        | mption and Cpd Calculation, 1                                 | 997                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|            | Altera Corporation,                 | AN 531: Reducing P      | ower with Hardware Accelerat                                  | ors, 2008                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|            |                                     |                         | es Architectures Reconfigurabl<br>lications Embarquées, 2007  | es pour                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|            |                                     |                         | power measurement of Xilinx<br>egrated Circuits and Systems   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|            |                                     | •                       | power models for controllers.<br>pposium on VLSI, 2000        | In                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|            |                                     |                         | nt per-cycle estimation at RTL<br>IEEE Transactions on, 2003  | Very                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|            | Raghunathan A. an                   | d al., Register-transfe | er level estimation techniques f                              | or                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |

switching activity and power consumption. In Proceedings of the 1996

Robin BONAMY

31/34





Figure 2: Representation of power consumption for a hardware task

| Problem     | OpenPEOPLE | Power Modelling<br>0 | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>000000000000000000000000000000000000 |
|-------------|------------|----------------------|---------------------------------------|---------------------------------------------------------|
| Power Meası | urement    |                      |                                       |                                                         |
|             |            |                      |                                       |                                                         |

## Power and Parallelization



Figure 2: Representation of power consumption for a parallelized hardware task



| Problem       | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration |
|---------------|------------|----------------------|---------------------------------------|-----------------|
| Power Measure | ement      |                      |                                       |                 |

## Execution time versus Loop Unrolling Index



Figure 3: Matrix multiplication execution time versus total loop unrolling index.(*LUI1*, *LUI2*, *LUI3*)

| Problem       | OpenPEOPLE | Power Modelling<br>⊙ | Hardware Accelerated Block<br>0000000 | Reconfiguration<br>○○○○○○○○○○○ |
|---------------|------------|----------------------|---------------------------------------|--------------------------------|
| Power Measure | ement      |                      |                                       |                                |

### Energy consumption versus Loop Unrolling Index



Figure 4: Power consumption measurements : matrix multiplication energy consumption versus total loop unrolling index.(LUI1, LUI2, LUI3)