 |
 |
Mar 10, 2013
By Bradley Budlong
If you've used the datapaths to perform operations, you know that they can perform an add or subtract and they can perform various logical operators such as AND, OR, and XOR along with shifting. A couple of the Cypress component development engineers have decided to take the capabilities of the datapaths much further. In a recent post to the Community Components section of the Cypress Forums, Richard Mc Sweeney has provided a component that can perform trigonometric functions using the CORDIC algorithm. Another component developer, CJ Cullen, has provided me with his implementation of a square root function using datapaths. I'll use the next several blog posts to go through CJ's description of the implementation he put together.
Algorithm
The approach we'll use for a square root algorithm is essentially a guess-and-check. To calculate the square root of N, we will continue to make a guess (let's call it root) until that guess is as close to sqrt(N) as possible without going over. At each guess, we will be checking if (root + Delta)2 <= N, where Delta gets progressively smaller until it is beyond the precision of our data type. If (root + Delta)2 is smaller than N, we will add Delta to root to get closer to our final answer. To do this intelligently (and efficiently on hardware), we will use powers of 2: Delta = 2n/2 - 1, 2n/2 - 2,.., 2, 1, where n is the maximum number of bits in any N we want our algorithm to handle. For you computer-sciency types, we are essentially performing a binary search of the solution space of our problem. This gives us a first cut of our algorithm:
Try #1
Delta = 2n/2 - 1, root = 0
while (Delta > 0)
{
if ((root + Delta)2 <= N)
{
root += Delta
}
Delta >>= 1
}
This looks okay. We take advantage of the fact that Delta is a power of 2, and a shift-right can get us every value of Delta that we need. However, we still have a multiply at every step, and that is never going to be fast or easy in the PSoC UDBs. We can do better.
Instead of checking against N every step, let's instead keep track of the remainder instead. Let M = N - root2. We can calculate that at the end of each step and hold onto it for the next step. In our first try above, we were checking ((root + Delta)2 <= N). Expanding that out, it is equivalent to checking the condition (root2 + 2*root*Delta + Delta2 <= N), which we can reorganize as (2*root*Delta + Delta2 <= N - root2), finally substituting in M to get (Delta *(2*root + Delta) <= M). There are now two multiplies necessary to compute our condition. Fortunately, one is a multiply by 2, the other is a multiply by a power of 2, so all we really have is shifts, which we can do just fine in the UDBs. This gives us a much more efficient algorithm:
Try #2
Delta = 2n/2 - 1, root = 0, M = N
while (Delta > 0)
{
if (Delta *(2*root + Delta) <= M)
{
M -= Delta*(2*root + Delta)
root += Delta
}
Delta >>= 1
}
This looks a lot better. All of the operations can be done efficiently in the datapath. Shifts and adds and subtracts are all one cycle apiece. But, the datapath can only shift one bit at a time. Multiplying Delta by (2*root + Delta) can take up to n/2 shifts. This isn't so bad, but it causes our algorithm to scale quadratically as we increase the number of bits. That still leaves another chance for optimization.
If we keep track of Delta and root, we are going to have to do some sort of multiplication by Delta at every step. Instead, we can build in the "multiply by Delta" into the variables that we track. We know that we will be shifting Delta one bit to the right at every iteration, so we can coordinate that shift with the other variables that we track. Let's define a variable D to track Delta*Delta and a variable q to track 2*root*Delta. Now, our condition has been simplified to (D + q <= M). Our old Delta variable was shifting right one bit every iteration, so our new D variable (Delta*Delta) should shift two bits to the right every iteration. Also, our q variable has a factor of Delta built into it, so we now need to shift that to the right by a bit each time through. With these substitutions, we come up with the following:
Try #3
D = 2n-2, q = 0, M = N
while (D > 0)
{
if (D + q <= M)
{
M -= (D + q)
q += 2*D
}
D >>= 2
q >>= 1
}
In the final iteration, when the "D >>= 2" causes D to change from 0x01 to 0x00, it actually represents changing our Delta value from 20 to 2-1. Therefore, q (which we are using to track 2*Delta*root) is now equal to 2*(2-1)*root, our final answer.
We've now reduced our algorithm to a bounded number of shifts and adds for each iteration. In the next post I'll go over how we can get this algorithm executing in the datapath.
|
|