Booth's algorithm: A comprehensive guide to fast signed multiplication

24Sep

Booth’s algorithm: A comprehensive guide to fast signed multiplication

Booth’s algorithm is a cornerstone technique in computer arithmetic, celebrated for its efficiency in multiplying binary integers. By recognising patterns in the multiplier and recoding the operation sequence, this method reduces the number of addition and subtraction steps required. In modern processors, embedded systems, and digital signal processing, Booth’s algorithm remains a practical choice for implementing fast, reliable multiplication. This article provides a thorough, reader-friendly exploration of Booth’s algorithm, its origins, how it works, its variants, and how to implement it in software and hardware with clear examples and tips for optimisation.

What is Booth’s algorithm?

Booth’s algorithm is a multiplication technique for signed binary numbers. Introduced by Andrew Booth in the early 1950s, the method transforms the standard multiplication problem into a sequence of a few simple operations: conditional addition or subtraction of the multiplicand, followed by shifting. The key idea is to examine pairs of bits from the multiplier rather than handling each bit independently. This bit-pair recoding reduces the number of non-trivial operations, particularly when there are long runs of consecutive ones or zeros in the multiplier.

In essence, the algorithm aims to minimise the workload by encoding the multiplier in a way that permits rapid partial products. Instead of adding the multiplicand for every nonzero multiplier bit, Booth’s algorithm groups bits and triggers add or subtract actions only when the group boundaries change. The result is a multiplication process that is often more efficient in both time and hardware resource usage, especially for wide integers.

The historical context of Booth’s algorithm

Andrew Booth introduced the algorithm in 1951 as a means to speed up binary multiplication for early computing hardware. At the time, the aim was to eliminate as many carry-save and carry-propagation operations as possible, given the hardware constraints of the era. The core concept—recoding the multiplier to reduce the number of adding steps—proved robust enough to endure for decades. Over time, Booth’s algorithm was extended and refined into various forms, including radix-4 Booth encoding, which further reduces the iteration count by handling two multiplier bits at a time rather than one.

Today, Booth’s algorithm sits alongside other well-known multiplication techniques. It remains a staple in computer architecture courses because it provides a clear, implementable approach to signed multiplication that scales gracefully with word length. Its enduring relevance is a testament to the elegance of the idea: look at the changes in the multiplier rather than every individual bit in isolation, and use targeted additions to build the final product.

Core principles: how Booth’s algorithm works

At the heart of Booth’s algorithm is a simple procedure built around the following ideas:

Represent the multiplier in two’s complement form to handle negative numbers naturally.
Process the multiplier from least significant bit to most significant bit, looking at bit pairs (Qi, Qi-1), where Qi is the i-th bit and Qi-1 is the previous bit (with an initial quiescent bit Q-1 set to 0).
Decide, based on the observed pair, whether to add, subtract, or do nothing with the multiplicand M.
After each decision, perform a right shift on the combined accumulator (which contains partial sum and multiplier) to prepare for the next step.

The decision table is straightforward and powerful. For the pair (Qi, Qi-1):

00: No operation occurs; move to the next bit.
01: Add the multiplicand M to the running total.
10: Subtract the multiplicand M from the running total.
11: No operation occurs; this pair represents a continuation of a prior non-zero action.

After each decision, the combined result is shifted to the right. The most significant bit of the accumulator extends to maintain the sign in two’s complement arithmetic. This sequence is repeated for the width of the multiplier, producing the final product in the accumulator when all iterations are complete.

Bit-pair recoding: the mechanism in practice

To understand the mechanism more concretely, picture the multiplier as a string of bits, with Q0 being the least significant bit and Qn-1 the most significant bit. We examine pairs (Q1, Q0), (Q2, Q1), and so on, while keeping a running q-1 bit that starts at 0. Each pair guides a potential addition or subtraction of M, followed by a right arithmetic shift of the combined register. The beauty of the approach is that long stretches of zeros or ones in the multiplier lead to fewer non-trivial operations, so the overall cost in addition steps drops substantially compared with a straightforward shift-and-add workflow.

Radix-4 Booth encoding and other refinements

One of the most important refinements is radix-4 Booth encoding. By grouping two consecutive multiplier bits at a time (instead of handling individual bits), radix-4 Booth’s algorithm reduces the number of iterations by approximately half. The trade-off is a more complex decision table and a more intricate set of partial products, but the net effect is a faster multiplier for wide word lengths. In radix-4 Booth encoding, the possible actions are expanded to include multiples of the multiplicand by -2, -1, 0, 1, or 2, depending on the current two-bit group. This allows for larger jumps toward the final product while keeping hardware implementations efficient.

Other variants and optimisations exist, including support for higher radices (like radix-8 Booth encoding) and adaptations for specific hardware constraints such as pipelining or parallelism. Each variant trades off complexity for potential speed gains, and the choice often depends on the target platform, the expected range of input values, and the desired balance between silicon area and throughput.

Algorithm in action: a step-by-step example

To illustrate Booth’s algorithm in a tangible way, we can work through a compact, 4-bit example. Suppose we want to multiply M = 3 (0011 in binary) by Q = 5 (0101 in binary). The product is 15 (0000 1111 in 8-bit representation). We use an 8-bit accumulator A|Q with an initial q-1 bit set to 0 and A initially 0. The process proceeds as follows:

Initial state: A = 00000000, Q = 00000101, q-1 = 0

i = 0: examine (Q0, q-1) = (1, 0) -> 01 → add M
A = A + M = 00000000 + 00000011 = 00000011
Right shift (A, Q) together with new q-1
A Q: 00000011 00000101 -> after right shift: A = 00000001, Q = 10000010, q-1 = 1

i = 1: examine (Q1, q-1) = (0, 1) -> 01 or 10? (Q1 is 0, q-1 is 1) -> 01 → add M
A = 00000001 + 00000011 = 00000100
Right shift
A Q: 00000100 10000010 -> after right shift: A = 00000010, Q = 01000001, q-1 = 0

i = 2: examine (Q2, q-1) = (0, 0) -> 00 → no operation
Right shift
A Q: 00000010 01000001 -> after right shift: A = 00000001, Q = 00100000, q-1 = 0

i = 3: examine (Q3, q-1) = (0, 0) -> 00 → no operation
Right shift
A Q: 00000001 00100000 -> after final shift: A = 00000000, Q = 00010000, q-1 = 0

Final product (A concatenated with Q): 00000000 00010000 = 0000000000010000 = 15

While this example is simplified, it demonstrates the core idea: decisions are driven by bit-pair patterns, and after each decision the combined register is shifted to prepare for the next step. In real hardware or software implementations, the example scales naturally to wider word lengths with additional iterations but the fundamental workflow remains consistent.

Implementing Booth’s algorithm in hardware

In hardware, Booth’s algorithm is typically implemented as an iterative process within a multiplier unit. The design must manage:

Input handling: two’s complement representation for signed numbers, sign extension to the chosen width, and a properly initialised q-1 bit.
Control logic: the decision table that maps the current pair (Qi, Qi-1) to an operation (add, subtract, or none).
Adder and shifter resources: the multiplicand may be held in a register, and a dedicated adder performs the required addition or subtraction. A barrel shifter or a sequence of shifts moves the partial product into the proper position.
Sign extension and finalisation: eight, sixteen, or wider bits may be used for the result, with care taken to ensure the sign bit is correctly extended into the upper region of the product.

Radix-4 Booth encoding can halve the number of cycles needed for a given width, but it demands more complex control logic to interpret two-bit groups and to produce the corresponding multiples of the multiplicand (including ±2M). Modern digital design often uses a combination of Booth encoding, Wallace tree adders, and pipelining to achieve high throughput while minimising latency and area.

Software implementation: how to write Booth’s algorithm

Software implementations of Booth’s algorithm typically operate on unsigned integers or signed integers converted to two’s complement form. A straightforward approach uses a loop over the bits of the multiplier, applying the bit-pair decision at each step and maintaining a running partial product. The following high-level pseudocode is representative of a standard Booth’s algorithm variant:

procedure BoothMultiply(M, Q, n):
    A := 0                 // partial accumulator (n+1 bits)
    QMinus1 := 0
    for i from 0 to n-1:
        currentPair := (Qi, QMinus1)
        if currentPair == (1, 0):     // 01
            A := A + M
        else if currentPair == (0, 1):  // 10
            A := A - M
        end if
        // arithmetic right shift of (A, Q) with new QMinus1
        (A, Q, QMinus1) := ArithmeticRightShift(A, Q, QMinus1)
    end for
    return concatenate(A, Q)

The exact syntax and data types depend on the language, but the structure remains the same: a loop over the width of the multiplier, a small set of operations (add, subtract, or none), and a final combination of the partial results. Optimisations in software include using intrinsic instructions for addition with carry, performing additions in parallel where possible, and leveraging radix-4 variants to halve the iteration count. For very wide operands, consider using a pipelined approach or a blocking strategy to maintain throughput on modern CPUs.

Pseudo-code for radix-4 Booth’s algorithm in software

procedure BoothMultiplyRadix4(M, Q, n):
    // Booth encoding with two-bit groups (radix-4)
    A := 0
    Q := Q
    QMinus1 := 0
    i := 0
    while i < n:
        twoBits := (Qi+1, Qi)          // group two bits
        action := lookupRadix4(twoBits) // {+M, -M, +2M, -2M, 0, etc}
        A := A + action * M
        // shift (A, Q) right by 2 bits, update QMinus1 accordingly
        (A, Q, QMinus1) := ArithmeticRightShiftRadix4(A, Q, QMinus1)
        i += 2
    end while
    return concatenate(A, Q)

Radix-4 implementations require careful handling of the grouping and the potential for larger multiples of M (such as ±2M). The lookup table for actions is deterministic and depends on the two-bit group, simplifying the control path. This approach reduces iteration count and can yield better performance on hardware that supports fast shift operations and parallel additions.

Advantages and limitations of Booth’s algorithm

Booth’s algorithm offers several notable advantages:

Reduced number of additions: By encoding the multiplier into bit patterns, the algorithm minimizes redundant add/sub operations, especially when the multiplier contains long runs of ones or zeros.
Natural handling of signed numbers: The use of two’s complement arithmetic simplifies dealing with negative values.
Scalability: The method scales well with word length and can be adapted with radix-4 or higher encodings to further improve throughput.

However, there are also considerations and potential drawbacks:

Increased control complexity: Radix-4 and higher-radix variants require more sophisticated control logic and data-paths, which can complicate hardware design.
Latency vs. throughput trade-offs: While radix-4 reduces the number of iterations, the per-iteration complexity grows due to handling larger multiples of M and more complex shifting.
Special cases: Implementations must carefully manage edge cases, such as overflow and sign extension, to ensure correct results for all inputs.

Booth’s algorithm versus the classic shift-and-add multiplier

The traditional shift-and-add multiplier handles each multiplier bit with a conditional addition of the multiplicand. This approach is straightforward and easy to implement but can incur a higher number of addition operations, particularly for wide multipliers with many non-zero bits. Booth’s algorithm improves on this by examining bit pairs and generating partial products more efficiently. In practical hardware design, Booth’s method often yields superior performance and lower area, especially when combined with proper pipelining and architectural optimisations. Nevertheless, in some contexts, a well-optimised shift-and-add unit may be simpler to implement and sufficient for the required throughput.

Practical considerations for teaching Booth’s algorithm

When explaining Booth’s algorithm to students or colleagues, a few pedagogical tips help crystallise understanding:

Start with intuition: illustrate why looking at bit pairs can reduce the number of operations compared with a naïve bit-by-bit approach.
Use small, concrete examples: work through a handful of 4-bit or 5-bit cases before generalising to wider widths.
Highlight sign handling: emphasise how two’s complement representation permits seamless processing of negative numbers.
Bridge to hardware: talk through how a real multiplier would implement the control logic, adder, and shifter, and why radix-4 offers a favourable speed-area trade-off.
Provide a migration path: show how the standard Booth’s algorithm connects to radix-4 variants and how to upgrade a simple implementation to a higher radix for performance gains.

Common pitfalls and debugging tips

Developers implementing Booth’s algorithm should be mindful of the following potential pitfalls:

Incorrect initialisation: ensure q-1 is initialised to 0 and the accumulator width is sufficient to avoid overflow during intermediate steps.
Sign extension errors: when performing arithmetic right shifts, make sure the sign bit is correctly propagated to preserve two’s complement semantics.
Bit ordering mistakes: keep a consistent convention for the bit order of the multiplier and the grouping scheme, particularly when using radix-4 or higher encoding.
Edge-case handling: validate the algorithm with maximum and minimum representable values, including zero multipliers and alternating bit patterns.
Hardware mapping: ensure that the adder, shifter, and control logic are correctly synchronised to avoid glitches in a pipelined design.

Real-world applications of Booth’s algorithm

Booth’s algorithm remains relevant across a range of computing domains:

Microprocessors and digital signal processors (DSPs): fast signed multiplication is essential for real-time signal processing and graphics computations, where Booth’s method offers a good balance of speed and hardware cost.
Embedded systems: compact, low-power multipliers benefit from radix-4 Booth encoding to achieve higher throughput without excessive silicon area.
ASIC and FPGA designs: Booth’s technique maps well to custom hardware implementations, with radix-4 and other variants allowing optimisations for a given fabrication process and toolchain.
Educational demonstrations: Booth’s algorithm provides a clean, approachable example of how clever bit-level recoding can accelerate arithmetic operations, making it a staple in computer architecture courses.

Depth perspectives: mathematical framing of Booth’s algorithm

From a mathematical standpoint, Booth’s algorithm can be seen as a clever re-expression of the product M × Q in two’s complement form. By recoding the multiplier into a sequence of signed multiples of M, the algorithm effectively compresses the operation count. The partial products can be viewed as successive contributions to the final product, each chosen according to the current bit-pair pattern. The right-shift step serves to align these contributions correctly in the final sum, preserving the proper place value for each iteration. This framing helps bridge the intuition from binary arithmetic to hardware-friendly implementations.

Extended discussion: architecture-aware optimisations

In high-performance design, the choice of Booth variant is often guided by the target architecture:

Radix-4 Booth encoding is frequently preferred in wide-word multipliers because it halves the number of iterations, which can significantly reduce latency in pipelined designs.
Combining Booth’s algorithm with Wallace tree adders and carry-save adders can further streamline the accumulation of partial products, reducing critical path delays.
Resource-sharing strategies, where the same adder and shifter blocks are reused across multiple iterations in a tightly coupled pipeline, can minimise silicon area without sacrificing throughput.
Software-hardware co-design approaches may implement Booth’s algorithm in firmware for small, low-power devices while using a purely hardware multiplier for performance-critical paths.

Conclusion: the enduring value of Booth’s algorithm

Booth’s algorithm stands as a testament to the enduring value of clever bit-level thinking in computing. By changing the perspective from processing each multiplier bit in isolation to recognising and exploiting patterns across two-bit blocks, the method reduces the number of necessary additions and streamlines the multiplication process. Its variants, especially radix-4 Booth encoding, offer practical performance advantages for modern hardware, while remaining accessible to software developers through clear pseudocode and robust theoretical foundations. Whether you are studying computer arithmetic, designing a custom processor, or implementing a robust numerical library, Booth’s algorithm provides a rich, well-founded approach to fast, reliable signed multiplication.