Fast Arithmetic on FPGA Using Redundant Binary Apparatus

By Louis-Philippe Lessard

Abtract

The redundant binary representation (RBR) has many advantages over traditional binary representation. This article tries to demonstrate the usefulness of RBR on FPGA. The performance and characteristics of apparatus for arithmetic operations (addition, subtraction and multiplication) and conversions between two’s complement and RBR are evaluated. A faster conversion scheme from RBR to two’s complement and moving average FIR filter are also demonstrated.

Introduction

The redundant binary representation (RBR) is a numeral system that uses more bits than needed to represent a single binary digit so that most numbers have several representations. RBR is unlike usual binary numeral systems, including two's complement, which use a single bit for each digit. Many of RBR's properties differ from those of regular binary representation systems. Most importantly, RBR allows addition without using a typical carry, but makes bitwise logical operation slower. Usually, every bit has a sign that is not necessarily the same as the sign of the number represented. When digits have signs, RBR is also a signed-digit representation (1).

RBR is a place-value notation system (2). In RBR, digits are pairs of bits, that is, for every place RBR uses a pair of bits. The value represented by an RBR digit can be found using a translation table. This table indicates the mathematical value of each possible pair of bits.

As in conventional binary representation, the integer value of a given representation is a weighted sum of the values of the digits. The weight starts at 1 for the rightmost position and goes up by a factor of 2 for each next position. Usually, RBR allows negative values. There is no single sign bit that tells if a RBR represented number is positive or negative. Most integers have several possible representations in a RBR.

An integer value can be converted back from RBR using the following formula, where n is the number of digit and dk is the interpreted value of the k-th digit, where k starts at 0 at the rightmost position:

∑_(k=0)^(n-1)▒〖d_k

Notation Used

The following redundant binary representation is used in this article:

Digit

Represented value

0 0

-1

0 1

0

1 0

0

1 1

1

Table 1 Example of redundant binary translation table

Number value

This notation has advantages not found in other redundant binary representation. It is possible to easily find the additive inverse of a value by flipping all the bits of the represented value using NOT gates. This allows building adder/subtracter unit more easily. (8)

Addition Unit

Addition in redundant binary representation can be done in constant time contrarily to addition in two’s complement notation. This can be explained by the fact that the carry does have to propagate through all the width of the addition unit. This does not imply that the addition is always faster in RBR than is two's complement representation, but that the addition will eventually be faster in RBR with increasing bit width because the two's complement addition unit's delay is proportional to log(n) (where n is the bit width) (3).

Addition unit

Addition unit

z=x+y

Results (Xilinx)

Bearing in mind, it is interesting to compare the performance of typical binary adder unit and redundant binary adder unit considering their bit width. The following results have been obtained by using Altera and Xilinx platform.

delay in adder unit on Xilinx platform

Figure 2 Combinatorial delay for redundant binary adder on Xilinx Virtex 5

Number of bits

xc5vlx85 3ff676
redundant

xc4vlx80 12ff1148
redundant

xc5vlx85-3ff676
Two’s complement

xc4vlx80-12ff1148
Two’s complement

8

1,736

2,368

1,499

1,831

16

1,736

2,368

1,681

2,135

32

1,736

2,368

2,045

2,758

64

1,736

2,368

2,773

3,959

128

1,736

2,368

4,229

6,391

256

1,736

2,368

7,141

15,723

Table 2 Redundant binary adder delay on Xilinx platforms

Those results shows that redundant binary representation become faster than two’s complement representation on Xilinx platform when the operands become larger than 32 bits. Addition being the cornerstone of arithmetic operations, it can be expected that similar result will be obtained for other arithmetic operations. However, it comes at the cost of using about twice as much LUT resource.

Results (Altera)

Delay in adder unit on Altera platform

Figure 3 Combinatorial delay for redundant binary adder on Altera Stratix III

EP3SE80F1152C2 Redundant

EP3SE80F1152C2 Two's complement

0,803

1,225

0,803

1,730

0,803

2,240

0,803

3,254

0,803

4,631

Table 3 Combinatorial delay for redundant binary adder on Altera Stratix III

Performance of RBR adder unit is much better on the Altera Platform. The RBR adder is always faster than the regular two’s complement adder. This can be explained by the fact that Stratix III FPGA architecture is much better at seven inputs to one output combinatorial function (4).

Substraction Unit

The subtraction is the same as the addition except that the additive inverse of one of the operand needs to be found. Essentially, the subtraction is the addition of one of the operand to the additive inverse of the other operand.

Using the notation used in this article, the additive inverse of a value is easily found by inverting every bit of an operand.

z = x-y = x+-y

Multiplication Unit

The multiplication unit evaluated here is made of many adder unit arranged in a tree. The multiplication unit is not pipelined although it could easily be.
Firstly, partials are computed by multiplying every digit of an operand with every digit of the other operand using usual arithmetic (5):

y_k

x_k*y_k

z=x*y

-1

-1

1

-1

0

0

-1

1

-1

0

-1

0

0

0

0

0

1

0

1

-1

-1

1

0

0

1

1

1

Table 4 Rule to compute the partial results

z=x*y

Multiplication example

Multiplication example

Results (Xilinx)

Delay in multiplication unit on Xilinx platform

Figure 4 Combinatorial delay in multiplication unit on Xilinx Virtex 5

Number of bit

xc5vlx85-3ff676
redundant

xc4vlx80-12ff1148
redundant

xc5vlx85-3ff676
Two's Complement

xc4vlx80-12ff1148
Two's Complement

8

5,668

6,312

2,91

4,152

16

6,443

7,713

2,91

4,152

32

7,662

9,049

7,923

8,559

64

8,921

10,623

11,705

13,346

128

10,14

12,319

14,165

18,75

Table 5 Combinatorial delay in multiplication unit on Xilinx Virtex 5

The delay in the multiplier unit as seen in Figure 4 and Table 5 is proportional to log n where n is the bit width of the operands. This is an inherent characteristic of the adder tree used in the design of the multiplier unit.

The performance of a RBR multiplier unit begins to be interesting when the multiplier unit have bit width greater than 32 bit. When this is the case, the delay of the multiplier becomes smaller than the delay of the specialized multiplier circuit of Xilinx FPGAs. However, it comes at a significant cost of LUT (Table 6). Moreover, this design could be easily pipelined leading to a significant lessening of delay. For example, a 64 bits RBR multiplier unit could be implemented as a 6 stages pipeline by adding D flip-flop between each level of the adder tree thus utilizing more of the FPGA resources. Considering this, the frequency is expected to be six times higher. This design would be interesting for a processor which would use only the redundant binary representation.


Number of bit

xc5vlx85-3ff676
Redundant (Number of LUT)

xc4vlx80-12ff1148
Redundant (Number of LUT)

8

272

328

16

1204

1385

32

5009

5632

64

20346

22636

128

81833

90684

Table 6 Resources used by RBR multiplier unit on Xilinx Virtex

Results (Altera)

This multiplication unit does not perform well compared to the built-in multiplication unit on Altera platform. This was to be expected because Stratix III FPGA have hardwired dedicated multiplication unit with native support up 36 bits (6).

Delay in multiplication unit on Altera platform

Figure 5 Delay in Altera multiplication unit

Number of bit

EP3SE80F1152C2
Redundant

EP3SE80F1152C2
Two's Complement

8

3,48

3,35

16

4,78

2,84

32

6,24

3,47

64

10,04

8,70

128

N/A

12,95

Table 7 Delay in Altera multiplication unit

Redundant binary to binary converter

The converter presented here uses Xilinx and Altera specialized carry propagation circuit to accelerate redundant binary to binary conversion. The basic idea is to convert the redundant binary number to two two’s complement number that then can be added with any two’s complement adder (7). This allows using the FPGA resources more efficiently and is most of the time faster than a regular redundant binary to binary converter using a sequential adder.

X being a redundant binary number, it is possible to convert it to 2 two’s complement number:

Those 3 numbers need to be added using standard two’s complement adder. A single adder is needed since ‘1’ can be added as a “carry in”. Thus the specialized adder circuit of FPGAs can be used to make the conversion from RBR to binary representation.

Results

two's complement conversion unit on Xilinx platform

Figure 6 Delay in RBR to two's complement conversion unit on Xilinx platform

Number of bit

xc5vlx85-3ff676
Adder conversion

xc4vlx80-12ff1148
Adder conversion

xc5vlx85-3ff676
Using LUT

xc4vlx80-12ff1148
Using LUT

8

1,682

1,661

1,99

2,186

16

1,864

1,933

2,676

3,22

32

2,228

2,477

3,221

3,913

64

2,956

3,565

4,112

4,676

128

4,412

5,741

4,63

5,329

Table 8 Delay in RBR to two's complement conversion unit on Xilinx platform

converter ressource usage on Xilinx

Figure 7 Resources used by RBR to binary converter on Xilinx platform

xc5vlx85-3ff676
Adder conversion

xc4vlx80-12ff1148
Adder conversion

xc5vlx85-3ff676
Sequential adder using LUT

xc4vlx80-12ff1148
Sequential adder using LUT

9

9

20

26

17

17

48

71

33

33

112

144

65

65

212

323

129

129

456

653

Table 9 Resources used by RBR to binary converter on Xilinx platform

Specialized adder based converters present, for operands up to 128 bits, the best performances in terms of speed and resources used.

FIR Filter : Moving Average

FIR filters are used in a variety of communication applications. FIR filter can be represented as a series of arithmetic operation. Because better arithmetic speed can be achieved with RBR on FPGA, an increase in FIR filter speed is expected. The FIR filter studied here is the moving average.

Moving average formula

The output y is the sum of the n last inputs. V is implemented as a circular buffer. During each cycle a value is read and the new input value is written at the head of the circular buffer. The moving average filter can also be expressed recursively in the following way:

Moving average formula

The last value y is stored in a register so that  can be calculated using the above formula during the next cycle. During each cycle, we need to add V_0 and subtract V_n to y_(i-1). An addition unit and a subtraction unit are used to calculate the next value of y.

delay of moving average filter on Xilinx platform

Figure 8 Maximum delay of moving average filter on Xilinx platform

Number of bit

xc5vlx85-3ff676
RBR

xc4vlx80-12ff1148
RBR

xc5vlx85-3ff676
Two's complement

xc4vlx80-12ff1148
Two's complement

8

2,651

4,264

2,946

4,748

16

2,659

4,79

3,128

5,079

32

2,651

5,192

3,492

5,993

64

2,659

5,219

4,22

7,204

128

2,692

5,518

5,676

8,357

Table 10 Maximum delay of moving average filter on Xilinx platform

Moving average filter delay on Altera

Figure 9 Maximum delay of moving average filter on Altera EP3SE80F1152C2

Number of bit

Two's Complement (ns)

RBR (ns)

8

3,13

2,89

16

3,40

3,33

32

4,08

3,11

64

5,34

3,80

128

7,53

4,11

Table 11 Maximum delay of moving average filter on Altera EP3SE80F1152C2

The result of the redundant binary unit is outstanding. Most notably, the Xilinx 5 platform performs at almost constant time using redundant binary.

Conclusion

The result displayed above show that redundant binary representation is useful to speed up arithmetic operation even on FPGA. Addition, subtraction and multiplication have been show to be faster when RBR is used, but sometime only for bit width of 64 bit and higher. However, this speed comes at a cost which is often acceptable considering the time critical nature of digital filter. Also, it is tolerable to use RBR in only a subset of a large circuit considering that RBR to two’s complement conversion is relatively fast when done properly. Further study on larger circuit using RBR would be interesting. Easy place and route operation are expected because of the symmetric nature of RBR apparatus.

Source Code

I hereby release this source code under GPL.

You can download it here.

The source is in VHDL. moving_avg.vhd is a moving average filter. radder.vhd is a RBR adder. raddsub.vhs is a RBR adder/substractor. rmul.vhd is a RBR multiplier. rconv.vhs is RBR to two's complement converter. tworconv.vhd is a two's complement to RBR converter.

Works cited

1. Panami, Behrooz. Generalized Signed-Digit Number Systems : A Unifying Framework for Redundant Number Representations. IEEE TRANSACTIONS ON COMPUTERS, VOL. 39, NO. 1. pp. 89-98.
2. Wikipedia. Positional Notation. [Online] [Cited: august 18, 2008.] http://en.wikipedia.org/wiki/Positional_notation.
3. Pai, Yu-Ting and Chen, Yu-Kumg. The fastest carry lookahead adder. Electronic Design, Test and Applications. Second IEEE International Workshop, 2004.
4. Altera. Stratix II: 8-Input Fracturable LUT in the ALM. Altera Web site. [Online] [Cited: August 20, 2008.] http://www.altera.com/products/devices/stratix-fpgas/stratix-ii/stratix-ii/features/architecture/st2-lut.html.
5. Guoping Wang, Murad Ozaydin, Monte Tull. High performance divider using redundant binary representation. IEEE. 2002.
6. DSP System Design in Stratix III devices. Altera. [Online] [Cited: october 6, 2008.] http://www.altera.com/literature/an/an504.pdf.
7. Iljoo Choo, R.G. Deshmukh. A Novel Conversion Scheme From a Redundant Binary Number to Two's Complement Binary Number For Parallel Architecture. IEEE. 2001.
8. Systematic Design of Pipelined Recursive Filters. Lapointe, Marcel, Huynh, Huu Tue and Fortier, Paul. s.l. : IEEE TRANSACTIONS ON COMPUTERS, 1993.

Valid XHTML 1.0 Transitional