# CS 3889 Arithmetic Logic Unit 

A.R. Hurson<br>323 CS Building,<br>Missouri S\&T<br>hurson@mst.edu

## Arithmetic Logic Unit

## $>$ Outline

* Motivation
* Design of a simple ALU
* How to design an ALU
* Fast ALU design
- Fast Adder
- Fast Multiplier
- Fast Divider


## Arithmetic Logic Unit

You are expected to be familiar with:

* Representation of numbers,
* Basic arithmetic operations in digital systems, including: addition, multiplication, and division,
* Concept of serial, parallel, and modular ALU

If not then you need to study
CS3889.module4

## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)

* In an attempt to improve the performance, this section will talk about the Arithmetic Logic Unit.
* In regard to our earlier CPU time, we are looking at techniques to reduce $p$.

$$
\mathrm{T}=\mathrm{I}_{\mathrm{C}}{ }^{*} \mathrm{CPI}{ }^{*} \tau=\mathrm{I}_{\mathrm{C}} *\left(\mathrm{p}+\mathrm{m}^{*} \mathrm{k}\right)^{*} \tau
$$

## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)

* It is a functional box designed to perform basic arithmetic, logic, and shift operations on the data.
*. Implementation of the basic operations such as logic, program control, and data transfer operations are easier than arithmetic and I/O operations. Therefore, in this section we concentrate on arithmetic operations.


## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)

* An ALU can be of three types:

OSerial
OParallel (see CS 3889.module4 for definitions and more discussion about serial and parallel ALU)
OFunctional (Modular)

## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)
*. Is it possible to improve the performance of an ALU beyond the performance of a modular ALU?

* Naturally, we can improve the performance (physical speed) by taking advantage of the advances in technology.
*How can we improve the logical speed of the ALU further?


## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)

* In a functional ALU, is it possible to devise algorithms which allow one to improve the performance of the basic operations?
* If this is a valid direction, then the question of how to design a fast ALU will change to "how to design a fast adder, a fast multiplier, ...?"


## Arithmetic Logic Unit

-Question

* As a computer architect, how do you design an ALU? In another words, in an attempt to design an ALU, what issues do you need to take into consideration?


## Arithmetic Logic Unit

>Fast Adder

* How to design an adder faster than a parallel adder?
* What is the major bottle-neck in a parallel adder?
* Is the carry generation and propagation the major bottleneck?
*. Is it possible to eliminate, moderate, or reduce the delay of carry generation and propagation?


## Arithmetic Logic Unit

Arithmetic and Logic Unit (ALU)

* Carry Lookahead
-Scheme 1
OScheme 2
* Carry Select
* Carry Lookahead plus Carry Select


## Arithmetic Logic Unit

-Fast Adder

* Carry Lookahead - Generate and propagate carries ahead of time, relative to a parallel adder.


## Arithmetic Logic Unit

-Fast Adder

* Basic Building Block - A 4-Bit Adder



## Arithmetic Logic Unit

-Fast Adder
, Basic Building Block - A 4-Bit Adder (Timing)

$$
\begin{array}{ll}
\mathrm{F}_{1}=4 \Delta \mathrm{t} & \mathrm{C}_{2}=4 \Delta \mathrm{t} \\
\mathrm{~F}_{2}=6 \Delta \mathrm{t} & \mathrm{C}_{3}=6 \Delta \mathrm{t} \\
\mathrm{~F}_{3}=8 \Delta \mathrm{t} & \mathrm{C}_{4}=8 \Delta \mathrm{t} \\
\mathrm{~F}_{4}=10 \Delta \mathrm{t} & \mathrm{C}_{5}=10 \Delta \mathrm{t}
\end{array}
$$

## Arithmetic Logic Unit

-Fast Adder
*. Carry Lookahead (Scheme 1)

$$
\mathrm{C}_{\mathrm{i}+1}=\mathrm{A}_{\mathrm{i}} \mathrm{~B}_{\mathrm{i}}+\left(\mathrm{A}_{\mathrm{i}} \oplus \mathrm{~B}_{\mathrm{i}}\right) \mathrm{C}_{\mathrm{i}}=\mathrm{A}_{\mathrm{i}} \mathrm{~B}_{\mathrm{i}}+\left(\mathrm{A}_{\mathrm{i}}+\mathrm{B}_{\mathrm{i}}\right) \mathrm{C}_{\mathrm{i}}
$$



## Arithmetic Logic Unit

>Fast Adder

* Carry Lookahead (Scheme 1)

OIn a 4-bit full adder

$$
\begin{aligned}
& \mathrm{C}_{1}=0 \\
& \mathrm{C}_{2}=\mathrm{g}_{1}+\mathrm{P}_{1} \mathrm{C}_{1} \\
& \mathrm{C}_{3}=\mathrm{g}_{2}+\mathrm{P}_{2} \mathrm{C}_{2}=g_{2}+\mathrm{P}_{2} g_{1}+\mathrm{P}_{2} \mathrm{P}_{1} \mathrm{C}_{1} \\
& \mathrm{C}_{4}=g_{3}+\mathrm{P}_{3} \mathrm{C}_{3}=\mathrm{g}_{3}+\mathrm{P}_{3} \mathrm{~g}_{2}+\mathrm{P}_{3} \mathrm{P}_{2} g_{1}+\mathrm{P}_{3} \mathrm{P}_{2} \mathrm{P}_{1} \mathrm{C}_{1} \\
& \mathrm{C}_{5}=\ldots
\end{aligned}
$$

## Arithmetic Logic Unit

$>$ Fast Adder — Carry Lookahead (Scheme 1)
*Extended 4-Bit Full Adder


Carry Lookahead (Scheme 1)


## Arithmetic Logic Unit

$>$ Fast Adder — Carry Lookahead (Scheme 1)

* Extended 4-Bit Full Adder — Timing $\mathrm{d} \cong 2 \Delta t$ $\mathrm{p}^{\mathrm{s}}$ and $\mathrm{g}^{\mathrm{s}}$ are generated in d
$\mathrm{C}^{\mathrm{s}}$ are generated after another d
$\mathrm{F}^{\mathrm{s}}$ are generated after another d


## Arithmetic Logic Unit

$>$ Fast Adder — Carry Lookahead (Scheme 1)


## Arithmetic Logic Unit

Fast Adder — Carry Lookahead (Scheme 2)

Timing


$$
\mathrm{CLA}=5 \Delta \mathrm{t}
$$

Cascades of CLAs overlap $1 \Delta$ t operation


## Arithmetic Logic Unit

>Fast Adder

* Carry Select

OCarry-in to a 4-bit full adder is either 0 or 1.

- Duplicate each stage - e.g., 4-bit full adder.

OInitiate each unit in a stage with carry-in of 0 and 1.

- Use a multiplexer to select the correct answer.


## Arithmetic Logic Unit

>Fast Adder — Carry Select


## Arithmetic Logic Unit

## Questions

* Calculate the execution time of a 16-bit adder using carry lookahead scheme 1.
* Formulate the execution time of an n-bit adder using carry lookahead scheme 1 ( n is a multiple of 4).
* Calculate the execution time of a 16-bit adder using carry lookahead Scheme 2.
*. Formulate the execution time of an n-bit adder using carry lookahead scheme 2 ( n is a multiple of 4).


## Arithmetic Logic Unit

## Questions

* Calculate the execution time of a 16-bit adder using carry select scheme.
* Formulate the execution time of an n-bit adder using carry select scheme.
* Is it possible to combine carry lookahead and carry select concepts to design a faster adder?


## Arithmetic Logic Unit

## M Multiplication

* Multiplication can be performed as a sequence of repeated additions.
* $A$ * $B$ is interpreted as add $A, B$ times. However, such a scheme is very inefficient with a time complexity of $\mathrm{O}(\mathrm{m})$ where $m$ is the magnitude of $B$.
*. A better approach to multiplication, add-and-shift, produces a time complexity of $\mathrm{O}(\mathrm{n})$ where $n$ is the length of the $B$.


## Arithmetic Logic Unit

$\checkmark$ Add-and-shift — hardware configuration

* Multiplier and multiplicand are two n-bit unsigned numbers,
* Result is a $2 n$-bit number stored in an accumulator and multiplier registers.



## Arithmetic Logic Unit

Add-and-shift — Algorithm
*In each iteration the least-significant bit of multiplier is checked;
Oif one, then multiplicand is added to the accumulator and the contents of accumulator and multiplier is shifted right one position.
Oif zero, just shift accumulator and multiplier to the right.
O See module3.background for additional discussion about Add-and-shift algorithm.

## Arithmetic Logic Unit

-Multiplication — Booth's Algorithm

* Booth's algorithm is an extension to the add-and-shift approach.
** In each iteration two bits of multiplier are being investigated and proper action(s) will be taken according to the following coding table:

00 no action shift right once
01 add multiplicand shift right once
10 sub multiplicand shift right once
11 no action shift right once
See module3.background for more discussion about Booth's algorithm.

## Arithmetic Logic Unit

Multiplication — Modified Booth's Algorithm

* Check 3 bits of multiplier at a time and take proper steps as follows:

000 no action
001 add multiplicand
010 add multiplicand
011 add 2*multiplicand
100 sub 2*multiplicand
101 sub multiplicand
110 sub multiplicand
111 no action
shift right twice shift right twice shift right twice shift right twice shift right twice shift right twice shift right twice shift right twice

## Arithmetic Logic Unit

Multiplication — Booth's Algorithm
*Any version of Booth's algorithm allows a sequence of consecutive $1^{\text {s }}$ to be bypassed.
*. Modified Booth's Algorithm is faster than Booth's Algorithm.

* Booth's Algorithm can be further extended by looking at 4 bits, ( 5 bits, ...) at a time and taking proper actions according to the proper encoding table.


## Arithmetic Logic Unit

## Multiplication — Modified Booth's Algorithm



## Arithmetic Logic Unit

Multiplication

* The add-and-shift algorithm can be used to multiply numbers (say $A$ and $B$ ) in $2^{\text {s }}$ complement, if the result is adjusted properly. Three cases can be recognized.
OCase 1: A positive; B negative
OCase 2: A negative; B positive
OCase 3: A negative; B negative


## Arithmetic Logic Unit

Multiplication — Case 1: A positive B negative
*Proof

$$
\begin{aligned}
& A * \widetilde{B}=A *\left(2^{n}-B\right)=2^{n} A-A * B \\
& A * B=2^{n} A-A * \widetilde{B} \\
& 2^{2 n}-A * B=2^{2 n}-2^{n} A+A * \widetilde{B} \\
& \widetilde{A B}=2^{n}\left(2^{n}-A\right)+A * \widetilde{B}=2^{n} \widetilde{A}+A * \widetilde{B}
\end{aligned}
$$

Multiply A and B using add-and-shift algorithm and adjust the result by $2^{\mathrm{n}} \cdot \widetilde{\mathrm{A}}$

## Arithmetic Logic Unit

Questions

* Justify case 2 and case 3.
* Is it possible to use the same technique for $1^{\text {s }}$ complement numbers?


## Arithmetic Logic Unit

-Multiplication - Example
*Perform 00101 * 11010 using add-and-shift algorithm, numbers are in $2^{s}$ complement format:

## Arithmetic Logic Unit

| E | AC | A |  |
| :---: | :---: | :---: | :---: |
| 0 | 00000 | 11010 | $\begin{aligned} & A_{n}=0, \text { shift right EACA } \\ & A_{n}=1, \text { add } B \end{aligned}$ |
| 0 | 00000 | 01101 |  |
|  | 00101 |  |  |
| 0 | 00101 | 01101 | Shift right EACA |
| 0 | 00010 | 10110 | $\mathrm{A}_{\mathrm{n}}=0$, shift right EACA |
| 0 | 00001 | 01011 | $\mathrm{A}_{\mathrm{n}}=1$, add B |
|  | 00101 |  |  |
| 0 | 00110 | 01011 | Shift right EACA$\mathrm{A}_{\mathrm{n}}=1 \text {, add } \mathrm{B}$ |
| 0 | 00011 | 00101 |  |
|  | 00101 |  |  |
| 0 | 01000 | 00101 | Shift right EACA Adjust the result |
| 0 | 00100 | 00010 |  |
|  | 11011 |  |  |
|  | 11111 | 00010 | - Answer |

## Arithmetic Logic Unit

-Fast Multiplication
*Reduction of Summands
OGenerate matrix of summands (partial products).
OGo over several reduction stages using 2-2 and 3-2 adders.

OIn final stage (2 rows) use a fast adder to generate the result.

## Arithmetic Logic Unit

PFast Multiplication — Reduction of Summands


## Arithmetic Logic Unit

FFast Multiplication — Reduction of Summands


$$
\begin{array}{llllllllll} 
& 0 & 1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 0 & 0 & 1 & & & 0 & \\
& & & & & & & & & \\
\hline 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 1 & & 1 & & & 0 & \\
\hline 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 1
\end{array}
$$

## Arithmetic Logic Unit

>Fast Multiplication — Reduction of Summands

* It is suitable for unsigned numbers.
* Number of reduction stages depends on the length of the multiplier.
* Execution time:



## Arithmetic Logic Unit

>Fast Multiplication

* Iterative Method

OMultiplication of 2 n-bit numbers can be converted into four multiplications of n/2-bit numbers and two additions.

OThis scheme can be iteratively applied to all multiplication terms

## Arithmetic Logic Unit

Fast Multiplication — Iterative Method

$$
\begin{aligned}
& X=2^{n / 2} a+b, Y=2^{n / 2} c+d \\
& X^{*} Y=\left(2^{n / 2} a+b\right)^{*}\left(2^{n / 2} c+d\right)=2^{n}(a c)+2^{n / 2}(a d+b c)+b d
\end{aligned}
$$

## Arithmetic Logic Unit

Fast Multiplication — Iterative Method


$$
\mathrm{X}^{*} \mathrm{Y} \Rightarrow
$$



## Arithmetic Logic Unit

-Fast Multiplication
**) Iterative Method - An Example
$x=a_{3} a_{2} a_{1} a_{0}$
$y=b_{3} b_{2} b_{1} b_{0}$
$x^{\star} y=2^{4}\left(a_{3} a_{2}\right)\left(b_{3} b_{2}\right)+2^{2}\left[\left(a_{3} a_{2}\right)\left(b_{1} b_{0}\right)+\left(a_{1} a_{0}\right)\left(b_{3} b_{2}\right)\right]+$
$\left(a_{1} a_{0}\right)\left(b_{1} b_{0}\right)$

## Arithmetic Logic Unit

-Fast Multiplication
*: Iterative Method - An Example

$$
\begin{aligned}
& x=1010 \\
& y=1100
\end{aligned}
$$



## Arithmetic Logic Unit

|  |  |  |  |  |  |  |  | $\begin{aligned} & \mathrm{a}_{7} \\ & \mathrm{~b}_{7} \end{aligned}$ | $\begin{array}{r} \mathrm{a}_{6} \\ \mathrm{~b}_{6} \\ \hline \end{array}$ | $\begin{aligned} & \mathrm{a}_{5} \\ & \mathrm{~b}_{5} \end{aligned}$ | $\begin{aligned} & \mathrm{a}_{4} \\ & \mathrm{~b}_{4} \\ & \hline \end{aligned}$ | $\begin{aligned} & \mathrm{a}_{3} \\ & \mathrm{~b}_{3} \end{aligned}$ | $\begin{aligned} & \mathrm{a}_{2} \\ & \mathrm{~b}_{2} \end{aligned}$ | $\begin{array}{r} \mathrm{a}_{1} \\ \mathrm{~b}_{1} \end{array}$ | $\mathrm{a}_{0}$ <br> $\mathrm{b}_{0}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  |  |  |  |  |  | $\mathrm{a}_{7} \mathrm{~b}_{0}$ | $a_{6} b_{0}$ | $\mathrm{a}_{5} \mathrm{~b}_{0}$ | $\mathrm{a}_{4} \mathrm{~b}_{0}$ | $a_{3} b_{0}$ | $a_{2} b_{0}$ |  | $\mathrm{a}_{0} \mathrm{~b}_{0}$ |
|  |  |  |  |  |  |  | $\mathrm{a}_{7} \mathrm{~b}_{1}$ | $a_{6} b_{1}$ | $\mathrm{a}_{5} \mathrm{~b}_{1}$ | $\mathrm{a}_{4} \mathrm{~b}_{1}$ | $a_{3} b_{1}$ | $\mathrm{a}_{2} \mathrm{~b}_{1}$ | $a_{1} b_{1}$ | $\mathrm{a}_{0} \mathrm{~b}_{1}$ |  |
|  |  |  |  |  |  | $\mathrm{a}_{7} \mathrm{~b}_{2}$ | $\mathrm{a}_{6} \mathrm{~b}_{2}$ | $\mathrm{a}_{5} \mathrm{~b}_{2}$ | $\mathrm{a}_{4} \mathrm{~b}_{2}$ | $\mathrm{a}_{3} \mathrm{~b}_{2}$ | $\mathrm{a}_{2} \mathrm{~b}_{2}$ | $a_{1} b_{2}$ | $\mathrm{a}_{0} \mathrm{~b}_{2}$ |  |  |
|  |  |  |  |  | $\mathrm{a}_{7} \mathrm{~b}_{3}$ | $a_{6} b_{3}$ | $\mathrm{a}_{5} \mathrm{~b}_{3}$ | $\mathrm{a}_{4} \mathrm{~b}_{3}$ | $a_{3} b_{3}$ | $\mathrm{a}_{2} \mathrm{~b}_{3}$ | $a_{1} b_{3}$ | $a_{0} b_{3}$ |  |  |  |
|  |  |  |  |  |  |  | $\mathrm{a}_{4} \mathrm{~b}_{4}$ |  |  |  | $\mathrm{a}_{0} \mathrm{~b}_{4}$ |  |  |  |  |
|  |  |  | $\mathrm{a}_{7} \mathrm{~b}_{5}$ | $\mathrm{a}_{6} \mathrm{~b}_{5}$ | $\mathrm{a}_{5} \mathrm{~b}_{5}$ | $\mathrm{a}_{4} \mathrm{~b}_{5}$ |  | $\mathrm{a}_{2} \mathrm{~b}_{5}$ |  | $a_{0} b_{5}$ |  |  |  |  |  |
|  |  | $\mathrm{a}_{7} \mathrm{~b}_{6}$ | $a_{6} b_{6}$ | $\mathrm{a}_{5} \mathrm{~b}_{6}$ | $\mathrm{a}_{4} \mathrm{~b}_{6}$ | $a_{3} b_{6}$ | $\mathrm{a}_{2} \mathrm{~b}_{6}$ | $a_{1} b_{6}$ | $a_{0} b_{6}$ |  |  |  |  |  |  |
|  | $\mathrm{a}_{7} \mathrm{~b}_{7}$ | $\mathrm{a}_{6} \mathrm{~b}_{7}$ | $\mathrm{a}_{5} \mathrm{~b}_{7}$ | $\mathrm{a}_{4} \mathrm{~b}_{7}$ | $\mathrm{a}_{3} \mathrm{~b}_{7}$ | $\mathrm{a}_{2} \mathrm{~b}_{7}$ | $\mathrm{a}_{1} \mathrm{~b}_{7}$ | $\mathrm{a}_{0} \mathrm{~b}_{7}$ |  |  |  |  |  |  |  |
| $2^{15}$ | $2^{14}$ | $2^{13}$ | $2^{12}$ | $2^{11}$ | $2^{10}$ | $2^{9}$ | $2^{8}$ | $2^{7}$ | $2^{6}$ | $2^{5}$ | $2^{4}$ | $2^{3}$ | $2^{2}$ | $2^{1}$ | $2^{0}$ |

## Arithmetic Logic Unit

-Fast Multiplication
*. Hurson's Scheme - Observations
OIn a parallel multiplier unit first an n*n matrix of partial products ( M ) is generated and then elements in each column are added.

$$
\mathrm{m}_{\mathrm{ij}} \varepsilon \mathrm{M} \begin{cases}1 & \text { if } \mathrm{B}_{\mathrm{i}}=\mathrm{Q}_{\mathrm{j}}=1 \quad 1 \leq \mathrm{i}, \mathrm{j} \leq \mathrm{n} \\ 0 & \text { otherwise }\end{cases}
$$

## Arithmetic Logic Unit

## PFast Multiplication

* Hurson's Scheme - Observations
- An element $\mathrm{m}_{\mathrm{ij}}$ in M is the result of an AND operation between the $i^{\text {th }}$ bit of multiplicand and $j^{\text {th }}$ bit of multiplier.
- In each column, zero elements do not affect the summation in that column and carry to the next column.
- Each pair of $1^{\mathrm{s}}$ in a column contributes a carry to the next column.
- The result of summation for each column is either zero (even number of $1^{\mathrm{s}}$ ) or one (odd number of $1^{\mathrm{s}}$ ).


## Arithmetic Logic Unit

-Fast Multiplication — Hurson's Scheme

* Generate only non-zero elements in each column.
* For each pair of $1^{\text {s }}$ in a column generate a carry to the next column.
* Count the number of $1^{\text {s }}$ in each column.


## Arithmetic Logic Unit

## - Fast Multiplication

*Hurson's Scheme - Example


## Arithmetic Logic Unit

>Fast Multiplication
*Full Adder Tree
OGenerate matrix of summands.
OUse a binary tree of full-adders to calculate the result in a pipeline fashion

## Arithmetic Logic Unit

## Fast Multiplication — Full Adder Tree



## Arithmetic Logic Unit

## Questions

*For an n * n multiplication, calculate the execution time of the operation using full adder tree scheme.

* Show the "snap shots" of the events to perform: 1101 * 1011 using full adder tree scheme.


## Arithmetic Logic Unit

- Fast Multiplication — Column Compression
* Assume that a population counter is available that can count the number of $1^{s}$ in an n-bit word, producing a $1+\left\lfloor\log _{2} n\right\rfloor$ bit result.
*Similar to the reduction of summands technique one can go through several reduction stages to compress the number of bits in each column.


## Arithmetic Logic Unit

- Fast Multiplication — Column Compression
* Generate matrix of summands.
* Go over several stages using population counters.
* In final stage ( 2 elements in each column) use a fast adder to generate the result.


## Arithmetic Logic Unit

Fast Multiplication — Column Compression


## Arithmetic Logic Unit

Fast Multiplication — Column Compression


## Arithmetic Logic Unit

-Question
*Formulate the execution time of an n * n multiplier unit using column compression scheme.

## Arithmetic Logic Unit

>Division

* Similar to multiplication, one can develop a routine to perform division as a sequence of subtractions.
*However, such an algorithm is very inefficient and slow.
* Instead one can develop an algorithm which performs division as a sequence of Compare, Shift and Subtract operations.


## Arithmetic Logic Unit

>Division

* One should note that division in a binary system is much simpler than the division in decimal system, since the quotient digits are either 0 or 1 .
*To minimize the hardware requirements, we should remember that:
- Comparison can be performed via arithmetic operation (s).
- Subtraction can be performed via complement-addition.
*. In other words; division requires almost the same hardware modules as multiplication does.


## Arithmetic Logic Unit

## Division

* Division can be carried out as a sequence of $n$ iterations.
* Dividend is a double register.
* One bit of the quotient is generated in each iteration.
* At the end of the operation, the quotient is in the 1st half part of the double register (low-order part), and remainder is in the 2nd half part.
* Sign of the quotient is the $\mathrm{X}-\mathrm{OR}$ of the signs of dividend and divisor.
* Sign of the remainder is the same as the sign of the dividend.


## Arithmetic Logic Unit

Division

* Methods of Division
-There are several different algorithms for division:
- Restoring Method
- Non-Restoring Method
- Direct Comparison


## Arithmetic Logic Unit

- Fast Division — SRT Method
* Faster direct division can be developed on normalized numbers by observing sequences of more than one bit of the dividend or partial remainder - i.e., sequences of $0^{5}$ and $1^{5}$ can be skipped.
* This method was proposed to improve binary floating-point arithmetic.


## Arithmetic Logic Unit

## Fast Division — SRT Method

* Assumptions

OThe dividend and divisor are binary fractions.
OThe divisor ( B ) is an n -bit normalized number i.e., $B=.1 \mathrm{~b}_{\mathrm{n}-2} \ldots \mathrm{~b}_{1} \mathrm{~b}_{0}, .5 \leq \mathrm{B}<1$.

OThe dividend-quotient (AQ) combination is a 2 n -bit register - i.e.,


## Arithmetic Logic Unit

>Fast Division — SRT Method

* Assumptions

OThe dividend is normalized during the division operation.
-Divide overflow condition will be detected and steps are taken in order for it to be overcome.

## Arithmetic Logic Unit

- Fast Division — SRT Method
* The divisor is normalized and the dividendquotient combination is adjusted by shifting it left the same number of positions that the divisor was shifted during normalization.
*This step allows that the relative magnitudes of divisor and dividend remain the same.


## Arithmetic Logic Unit

- Fast Division — SRT Method
*AQ is normalized - i.e., for each shift left a 0 is inserted for $\mathrm{q}_{0}$ — Skipping over zeros.

$$
A Q=.1 \quad a_{n-2} \quad \ldots \quad a_{1} a_{0} q_{n-1} \ldots q_{k} \frac{\text { K Zeros }}{00 \ldots} 0
$$

*. After this step, repeat the following sequence of steps:

## Arithmetic Logic Unit

FFast Division - SRT Method
*Subtract divisor from the dividend:
OIf positive result, a 1 is inserted for $\mathrm{q}_{0}$ and left shift AQ register.

## Arithmetic Logic Unit

Fast Division — SRT Method
OIf negative result - i.e.,
$A Q=1.1 \ldots \quad a_{1} a_{0} q_{n-1} \ldots \overbrace{00} \ldots 0$
$\square$ Insert 0 for $\mathrm{q}_{0}$ and shift left AQ register
$\square$ Shift over $1^{\text {s }}$, and insert $1^{\text {s }}$ until


- Add B to A and shift AQ to left


## Arithmetic Logic Unit

- Fast Division — SRT Method
*Perform the following operation



## Arithmetic Logic Unit

Fast Division — SRT Method

|  | . 00000 | 10111 |
| :---: | :---: | :---: |
| Adjust AQ | . 00010 | 111** |
| Shift over $0^{\text {s }}$ | . 10111 | **000 |
| Subtract B | 1.01100 |  |
| Positive Result: | 0.00011 | **000 |
| Shift AQ left, $\mathrm{q}_{0} \leftarrow 1$ | .0011* | *0001 |
| Shift over $0^{\text {s }}$ | .11**0 | $\underline{00100}$ |
| Remainder $\quad$ Quotient |  |  |

## Arithmetic Logic Unit

- Fast Division — SRT Method

$$
\begin{array}{rll}
\mathrm{A} & \mathrm{Q} & \\
\mathrm{AQ} & =.0000110111 & \left(55 * 2^{-10}\right) \\
\mathrm{B} & =.01010 & \left(10 * 2^{-5}\right)
\end{array}
$$

Normalized B = . 10100
Normalized B $=1.01100$

## Arithmetic Logic Unit

Fast Division — SRT Method

|  | .00001 | 10111 |
| :--- | ---: | :--- |
| Adjust AQ | .00011 | $0111^{*}$ |
| Shift over $0^{\mathrm{s}}$ | .11011 | $1^{*} 000$ |
| Subtract B | $\underline{1.01100}$ |  |
| Positive Result: | 0.00111 | $1^{*} 000$ |
| Shift AQ left, $\mathrm{q}_{0} \leftarrow 1$ | .01111 | $* 0001$ |
| Shift over 0s | $.1111^{*}$ | 00010 |
| Subtract B | $\underline{1.01100}$ |  |
| Positive Result: | $0.0101^{*}$ | 00010 |
| Shift AQ left, $\mathrm{q}_{0} \leftarrow 1$ | $.101^{*} 0$ | 00101 |

## Arithmetic Logic Unit

- Fast Division — SRT Method

$$
\begin{aligned}
& \mathrm{A} \\
& \mathrm{~A} \\
& \mathrm{AQ}= .00101 \\
& \mathrm{~B}= .01111 \\
& \text { Normalized } \mathrm{B}= .11110
\end{aligned}
$$

Normalized B $=1.00010$

## Arithmetic Logic Unit

- Fast Division — SRT Method

|  | . 00101 | 00100 |
| :---: | :---: | :---: |
| Adjust AQ | . 01010 | 0100* |
| Shift over $0^{\text {s }}$ | . 10100 | 100*0 |
| Subtract B | 1.00010 |  |
| Negative Result: | 1.10110 | $100 * 0$ |
| Shift AQ left, $\mathrm{q}_{0} \leftarrow 0$ | 1.01101 | 00*00 |
| Add B | . 11110 |  |
| Positive Result: | 0.01011 | 00*00 |
| Shift AQ left, $\mathrm{q}_{0} \leftarrow 1$ | . 10110 | 0*001 |
| Subtract B | 1.00010 |  |
| Negative Result: | 1.11000 | 0*001 |

## Arithmetic Logic Unit

- Fast Division — SRT Method

| Negative Result: | 1.11000 | $0 * 001$ |
| :--- | ---: | :--- |
| Shift AQ left, q | $\leftarrow 0$ | 1.10000 |
| $* 0010$ |  |  |
| Shift over 1s | $1.0000^{*}$ | 00101 |
| Add B | .11110 |  |
| Negative Result: | $1.1111^{*}$ | 00101 |
| Shift AQ left, q $\leftarrow 0$ | $1.111^{*} 0$ | 01010 |
| Correct remainder by | $1.1111^{*}$ |  |
| shifting A and adding B | $\underline{.11110}$ | quotient |
|  | $0.1110^{*}$ |  |

## Arithmetic Logic Unit

- Fast Division — Divisor Reciprocation
*. This method generates the reciprocal of the divisor using an iterative process, and then obtains the quotient by multiplying the dividend by the divisor reciprocal

$$
\mathrm{A} / \mathrm{B}=\mathrm{A} *(1 / \mathrm{B})
$$

## Arithmetic Logic Unit

- Fast Division — Divisor Reciprocation
*The divisor (B) is assumed to be a positive and normalized number,


## $1 / 2 \leq \mathrm{B}<1 \Rightarrow 1<1 / \mathrm{B} \leq 2$

* An initial value $X_{0} \approx 1 / B$ is determined using a ROM table or a combinational logic circuit,

$$
B=.1 \mathrm{~b}_{2} \mathrm{~b}_{3} \ldots \mathrm{~b}_{\mathrm{n}} \Rightarrow \mathrm{X}_{0}=1 . \mathrm{d}_{1} \mathrm{~d}_{2} \ldots \mathrm{~d}_{\mathrm{n}}
$$

## Arithmetic Logic Unit

- Fast Division - Divisor Reciprocation
* Then the following iterative cycles will be performed to determine the inverse value with reasonable accuracy

$$
a_{0}=B x_{0}\left(\begin{array}{l}
x_{1}=x_{0}\left(2-a_{0}\right) \\
a_{1}=a_{0}\left(2-a_{0}\right),
\end{array}, \begin{array}{l}
x_{2}=x_{1}\left(2-a_{1}\right) \\
a_{2}=a_{1}\left(2-a_{1}\right)
\end{array},, \begin{array}{l}
x_{n}=x_{n-1}\left(2-a_{n-1}\right) \\
a_{n}=a_{n-1}\left(2-a_{n-1}\right)
\end{array}\right.
$$

* The number of iterations ( n ) will be chosen to satisfy the following relation:

$$
\left|1-\mathrm{B} \mathrm{x}_{\mathrm{n}}\right| \leq \varepsilon
$$

## Arithmetic Logic Unit

Fast Division — Divisor Reciprocation

* Assume B = . $75 \Rightarrow 1 / \mathrm{B}=1.3333$...
*Take $\mathrm{X}_{0}=1$, naturally $\mathrm{X}_{0}$ is not the exact inverse of B and the error is $\delta=.333333$...

$$
\begin{gathered}
\mathrm{X}_{1}=\mathrm{X}_{0}\left(2-\mathrm{BX}_{0}\right)=1(2-.75)=1.25 \quad \delta=.08333 \ldots \\
\mathrm{X}_{2}=\mathrm{X}_{1}\left(2-\mathrm{BX}_{1}\right)=1.25(2-.75 * 1.25)=1.328125 \\
\delta=.005208333 \ldots \\
\mathrm{X}_{3}=\mathrm{X}_{2}\left(2-\mathrm{BX}_{2}\right)=1.328125(2-.75 * 1.32815) \\
=1.333313 \quad \delta=.000020333 \ldots
\end{gathered}
$$

## Arithmetic Logic Unit

$>$ Fast Division — Multiplicative Division
*. The operation of division is replaced by that of finding a factor $F$ such that:
B * F =1 and $A * F=Q$

* An iterative method can be used to determine $F$.


## Arithmetic Logic Unit

Fast Division — Multiplicative Division

* In each iteration a constant factor (multiplying factor) $\mathrm{F}_{\mathrm{i}}(1 \leq \mathrm{i} \leq \mathrm{n})$ is calculated to converge the denominator (Divisor) rapidly toward 1.

$$
Q=\frac{A^{*} F_{0} * F_{1} * \ldots * F_{n}}{B^{*} F_{0} * F_{1} * \ldots * F_{n}}
$$

## Arithmetic Logic Unit

Fast Division — Multiplicative Division

* The numerator (dividend) and the denominator (divisor) are both positive fractions.
* The divisor is a normalized number and the dividend is shifted accordingly.

$$
1 / 2 \leq \mathrm{B}<1, \quad \mathrm{~B}=1-\delta \Rightarrow 0<\delta \leq 1 / 2
$$

## Arithmetic Logic Unit

Fast Division — Multiplicative Division
${ }_{*}^{*} \mathrm{~F}_{\mathrm{i}}^{\mathrm{s}}(0 \leq \mathrm{i} \leq \mathrm{n})$ are chosen such that $\mathrm{B}_{\mathrm{i}-1}<\mathrm{B}_{\mathrm{i}}$, Where

$$
\begin{aligned}
& B_{0}=B^{*} F_{0} \\
& B_{1}=B^{*} F_{0} * F_{1} \\
& \vdots \\
& B_{i-1}=B^{*} F_{0}{ }^{*} F_{1}{ }^{*} \ldots * F_{i-1} \\
& B_{i}=B{ }^{*} F_{0} * F_{1}{ }^{*} \ldots{ }^{*} F_{i-1}^{*} F_{i} \\
& \vdots \\
& B_{n}=B_{n-1} * F_{n}
\end{aligned}
$$

## Arithmetic Logic Unit

$>$ Fast Division — Multiplicative Division
酸 $=1-\delta \Rightarrow \mathrm{F}_{0}=1+\delta$, hence:

$$
B_{0}=(1-\delta)(1+\delta)=1-\delta^{2} \Rightarrow B_{0} \text { is closer to } 1 \text { than } B
$$

* $\mathrm{F}_{1}=1+\delta^{2}$ hence:

$$
\mathrm{B}_{1}=\mathrm{B}_{0} * \mathrm{~F}_{1}=\left(1-\delta^{2}\right)\left(1+\delta^{2}\right)=\left(1-\delta^{4}\right)
$$

$F_{i}=1+\delta^{21}$

## Arithmetic Logic Unit

Fast Division — Multiplicative Division

* Note: The initial multiplying factor ( $\mathrm{F}_{0}$ ) can be obtained by a table look up.
* Note: Since we are dealing with binary numbers

$$
F_{i}=1+\delta^{2 i}=2-\left(1-\delta^{2 i}\right)=2-\quad B_{i-1}=\widetilde{B_{i-1}}
$$

## Arithmetic Logic Unit

$>$ Fast Division — Multiplicative Division
*For each iteration two multiplications are required:

- One to process the next denominator from which the next multiplying factor is obtained, and
OOne that produces the next numerator.


## Arithmetic Logic Unit

Fast Division - Multiplicative Division

$$
\begin{array}{lll}
\mathrm{F}_{0} \text { Obtain } & \mathrm{B}_{0}=\mathrm{B} * \mathrm{~F}_{0} \quad \mathrm{~A}_{0}=\mathrm{A} * \mathrm{~F}_{0} \\
\mathrm{~F}_{1}=\widetilde{\mathrm{B}}_{0} & \mathrm{~B}_{1}=\mathrm{B}_{0} * \mathrm{~F}_{1} \quad \mathrm{~A}_{1}=\mathrm{A}_{0} * \mathrm{~F}_{1} \\
& \text { If } \mathrm{B}_{1}=1 \text { then } \mathrm{A}_{1}=\mathrm{Q}, \text { Terminate } \\
\mathrm{F}_{2}=\widetilde{\mathrm{B}}_{1} & \mathrm{~B}_{2}=\mathrm{B}_{1} * \mathrm{~F}_{2} \quad \mathrm{~A}_{2}=\mathrm{A}_{1} * \mathrm{~F}_{2} \\
& \text { If } \mathrm{B}_{2}=1 \text { then } \mathrm{A}_{2}=\mathrm{Q}, \text { Terminate }
\end{array}
$$

