## Implementing Complex Math using the Datapath

If you've used the datapaths to perform operations, you know that they can perform an add or subtract and they can perform various logical operators such as AND, OR, and XOR along with shifting. A couple of the Cypress component development engineers have decided to take the capabilities of the datapaths much further. In a recent post to the Community Components section of the Cypress Forums, Richard Mc Sweeney has provided a component that can perform trigonometric functions using the CORDIC algorithm. Another component developer, CJ Cullen, has provided me with his implementation of a square root function using datapaths. I'll use the next several blog posts to go through CJ's description of the implementation he put together.

Algorithm

The approach we'll use for a square root algorithm is essentially a guess-and-check. To calculate the square root of N, we will continue to make a guess (let's call it root) until that guess is as close to sqrt(N) as possible without going over. At each guess, we will be checking if (root + Delta)2 <= N, where Delta gets progressively smaller until it is beyond the precision of our data type. If (root + Delta)2 is smaller than N, we will add Delta to root to get closer to our final answer. To do this intelligently (and efficiently on hardware), we will use powers of 2: Delta = 2n/2 - 1, 2n/2 - 2,.., 2, 1, where n is the maximum number of bits in any N we want our algorithm to handle. For you computer-sciency types, we are essentially performing a binary search of the solution space of our problem. This gives us a first cut of our algorithm:

Try #1

Delta = 2n/2 - 1, root = 0
while (Delta > 0)
{
if ((root + Delta)2 <= N)
{
root += Delta
}
Delta >>= 1
}

This looks okay. We take advantage of the fact that Delta is a power of 2, and a shift-right can get us every value of Delta that we need. However, we still have a multiply at every step, and that is never going to be fast or easy in the PSoC UDBs. We can do better.

Instead of checking against N every step, let's instead keep track of the remainder instead. Let M = N - root2. We can calculate that at the end of each step and hold onto it for the next step. In our first try above, we were checking ((root + Delta)2 <= N). Expanding that out, it is equivalent to checking the condition (root2 + 2*root*Delta + Delta2 <= N), which we can reorganize as (2*root*Delta + Delta2 <= N - root2), finally substituting in M to get (Delta *(2*root + Delta) <= M). There are now two multiplies necessary to compute our condition. Fortunately, one is a multiply by 2, the other is a multiply by a power of 2, so all we really have is shifts, which we can do just fine in the UDBs. This gives us a much more efficient algorithm:

Try #2

Delta = 2n/2 - 1, root = 0, M = N
while (Delta > 0)
{
if (Delta *(2*root + Delta) <= M)
{
M -= Delta*(2*root + Delta)
root += Delta
}
Delta >>= 1
}

This looks a lot better. All of the operations can be done efficiently in the datapath. Shifts and adds and subtracts are all one cycle apiece. But, the datapath can only shift one bit at a time. Multiplying Delta by (2*root + Delta) can take up to n/2 shifts. This isn't so bad, but it causes our algorithm to scale quadratically as we increase the number of bits. That still leaves another chance for optimization.

If we keep track of Delta and root, we are going to have to do some sort of multiplication by Delta at every step. Instead, we can build in the "multiply by Delta" into the variables that we track. We know that we will be shifting Delta one bit to the right at every iteration, so we can coordinate that shift with the other variables that we track. Let's define a variable D to track Delta*Delta and a variable q to track 2*root*Delta. Now, our condition has been simplified to (D + q <= M). Our old Delta variable was shifting right one bit every iteration, so our new D variable (Delta*Delta) should shift two bits to the right every iteration. Also, our q variable has a factor of Delta built into it, so we now need to shift that to the right by a bit each time through. With these substitutions, we come up with the following:

Try #3

D = 2n-2, q = 0, M = N
while (D > 0)
{
if (D + q <= M)
{
M -= (D + q)
q += 2*D
}
D >>= 2
q >>= 1
}

In the final iteration, when the "D >>= 2" causes D to change from 0x01 to 0x00, it actually represents changing our Delta value from 20 to 2-1. Therefore, q (which we are using to track 2*Delta*root) is now equal to 2*(2-1)*root, our final answer.

We've now reduced our algorithm to a bounded number of shifts and adds for each iteration. In the next post I'll go over how we can get this algorithm executing in the datapath.

ALL CONTENT AND MATERIALS ON THIS SITE ARE PROVIDED "AS IS". CYPRESS SEMICONDUCTOR AND ITS RESPECTIVE SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY OF THESE MATERIALS FOR ANY PURPOSE AND DISCLAIM ALL WARRANTIES AND CONDITIONS WITH REGARD TO THESE MATERIALS, INCLUDING BUT NOT LIMITED TO, ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT OF ANY THIRD PARTY INTELLECTUAL PROPERTY RIGHT. NO LICENSE, EITHER EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, IS GRANTED BY CYPRESS SEMICONDUCTOR. USE OF THE INFORMATION ON THIS SITE MAY REQUIRE A LICENSE FROM A THIRD PARTY, OR A LICENSE FROM CYPRESS SEMICONDUCTOR.

Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms and Conditions of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms and Conditions of this site. Cypress Semiconductor and its suppliers reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.