Mar 31, 2013
Now that all the design work has been done for the square root component, we can do the work of creating the component in the tool. First, create a new project in PSoC Creator. Then, create a new component by right clicking your project in the Components tab, selecting "Add Component Item..." and selecting "Verilog File":
Now, load up the Datapath Configuration Tool and open the Verilog file that was created in the previous step. Select Edit->New Datapath. Choose the Instance Type to be cy_psoc3_dp8, give it a name, and click OK.
This will create a single 8-bit datapath with all of the default settings. Now we can configure the CFGRAM to implement our 8 operations. I've highlighted the important entries for each operation, as well as any that changed from the default:
Now we can configure the static portion of the datapath configuration. There isn t much to do here. We need to make sure that our comparison is configured correctly, and that both FIFOs have dynamic mode enabled. Everything else can stay with the default values.
Now, save the file and close the Datapath Configuration tool. The last step is editing the Verilog file to add the state machine parameter values from the last blog post, the actual state machine and a few other pieces of Verilog to complete the logic of the component.
We'll define a few more control signals: A fifo_dyn wire to determine when the FIFOs are CPU/DMA-controlled vs. datapath-controlled, an eoc wire to indicate when computation has completed, and a cs_addr wire to dynamically select the operation. Note that this is simply the lower 3 bits of the state. We'll also need wires for each of the two conditions that we plan on using in the state machine:
The state machine is relatively simple. It can be pieced together from the state diagram:
Finally, we need to hook up our control and condition wires to the datapath:
We now have a component that configures the datapath to compute the square root of an 8-bit number. In the CJCU_Isqrt component, CJ has extended this implementation to 16, 24, and 32 bits, as well as providing APIs. For information on how to extend datapath functionality to multiple bytes, see PSoC Creator 213. To see how to design component APIs for datapath components, see PSoC Creator 214.
Rating: Be the first to rate
Mar 30, 2013
In the previous post, we came up with the 13 states necessary to calculate the square root. There's no limit on the number of states we can have, but the number of unique operations is limited to 8. Currently we have 10, so we'll need a few tricks to make this work.
Notice that I haven't listed the "(load F0/F1)" as part of the operations. FIFO loading is controlled directly from our state machine, independent of the operation.
We need to trim out two operations.
First, notice that in state 11 (op 9), we only care about the "A1 = F1" part. It doesn t matter at all what happens to A0, so we can have state 11 reuse op 8.
Second, notice that op 5 and op 7 are similar, except that op 7 also loads F0 into A0. State 5 currently starts with an empty F0, but we can have state 4 load F0 and then state 5 can copy it back. State 5 can then reuse op 7, so long as we also modify state 4 to load A0 into F0.
We now have the following operations:
And the following states:
Or in its state diagram form:
The next trick is in mapping the states to the operations using Verilog. We have 13 states, so we'll need 4 bits to encode them. Conveniently, no operation is used in more than two states. This allows us to use the bottom three bits to encode the operations we defined above, while using the top bit to distinguish between the two states that use that operation. With this mapping no additional resources are needed to map from states to operations.
It's usually good practice to give meaningful names to states, but since I've been using numbers above, I'll continue to name them with numbers here:
I've noted in the comments which states we want to trigger a FIFO load. We use this to define wires that will be used as control signals to the datapath:
We're almost done at this point. The next blog post will conclude the implementation of the square root component.
Rating: Be the first to rate
Mar 25, 2013
Last post, I went through a couple of iterations of a square root algorithm before arriving at an efficient method using only shifts and adds. Now, let's figure out how to configure the datapath to execute this algorithm.
The C-style pseudo-code from the previous post is nice, but it will be easier to map if we try to represent it in a state diagram:
Notice that I've reordered a few of the operations. Instead of adding 2*D to q and then shifting the result to the right, I am now shifting q first and then adding D. This saves a shift on one of the two branches.
Now we can start thinking about how the datapath can do this.
Normally the datapath can only track two variables as a part of an iterative algorithm (one in A0, one in A1). With the latest version of the datapath which is what is in PSoC 3 and PSoC 5LP (not available in PSoC 5), there is a new mode called FIFO Dynamic Mode. In the standard mode, the datapath FIFOs are configured statically as either input or output. An input FIFO is written by the CPU or DMA and read by the datapath. An output FIFO is the opposite. With FIFO Dynamic Control, the FIFOs can be individually and dynamically selected to be in either internal or external mode. Internal mode means the datapath has full read and write control over the FIFO. External mode gives the CPU/DMA full control. This means that the CPU can load values into a FIFO in external mode, switch it into internal mode to allow the datapath to perform some computation, and then read the values back out of the same FIFO once the datapath has completed. This also means that the datapath has up to 8 new places to store computed values (two 4-byte FIFOs). Medium-complexity computations that used to eat up CPU cycles can now be offloaded and/or parallelized, speeding up applications and letting the CPU focus on more complex problems. In the square root algorithm we'll just need storage for 3 variables.
It usually makes sense to clearly partition variables between the two sides of the datapath (A0/D0/F0 vs. A1/D1/F1). It is possible to configure F0 to receive writes from A1, and to use one of our dynamic configuration operations to swap A0 and A1, but this adds complexity and cycles. For this algorithm, I'll track D in A0/D0/F0, and I'll track q and M in A1/D1/F1.
Now let's start looking at the states we'll need for our datapath state machine. The first thing we need to set up are the initial conditions. M is initially set to N, the radicand of our square root. Let's assume that value got put into a FIFO (specifically F1) before FIFO control got switched to internal. The other two initial values are static, so we can use D0 to hold 2n - 2, and D1 to hold 0. Then, the first thing we need to do is load these values out of the data register into the accumulators. Luckily, we can independently select the write-source for A0 and A1, so we can load from the data registers in parallel. We can also use this state as our idle state. If we have some "start" signal, we are just fine spinning here until start goes high. That gives us our initial state:
Next, we need to compute D + q. Because we will be comparing this to M (partitioned to the "1" side) it will be easiest to store the result of the addition in A0. In parallel, we can also load M into A1 to prepare for the comparison. We need to keep around the current values of D and q, so load them into their respective FIFOs as well:
To check the condition "D + q <= M" we will use the cl1 condition provided by the datapath (cl means Compare Less Than). The cl0 condition always compares A0 < D0, but the cl1 condition can be configured to compare A1 < A0, which is exactly what we need. If D + r is greater than M (A1 < A0), we take the left branch in our state machine above. If not, we ll go down the right branch. We could perform a NOP in our current state while we wait for the result of the condition. Instead, let's precompute a value that we need for the right branch, the longer of the two execution branches: M = M (D + q):
Let's first consider the right branch. We guessed correctly with our precompute! We are done with M for this iteration, so we can shove it back in the FIFO. The next thing we need to do is compute the new values of D and q, so bring them back into the accumulators:
We need to shift q one bit to the right, and there isn't much else we can do until then, so:
Now we can compute our new value for q by adding D to what we just shifted:
Now start to calculate our new value of D:
We need to shift D one more time. At the same time, we can take a shortcut from the state machine above. D starts at 2n 2 and gets shifted twice every iteration, producing the series 0x40, 0x10, 0x04, 0x01. That means that on the final iteration of our algorithm, when D is about to shift off, the first shift will have already taken care of shifting D to 0. We will shift D one more time for all of the other iterations, but we can check our condition in this state to see if we are ready to stop looping. The z0 condition is hardwired to compare (A0 == 0). If z0 is false, we need to jump back to state #1. We have our registers set up exactly how state #1 expects, so there is no more work to be done there. If z0 is true, we can bail out of our loop and prepare for the result to be read out.
Now we will examine the left branch of the state machine. We guessed wrong on our precompute, so the first thing we need to do is undo that. At the same time, we can bring D back into the accumulator so that it's ready to shift:
We aren't going to change M this iteration, so go ahead and put it back in the FIFO. This will allow us to load q into the accumulator. We don't need to add D to q in this branch, so we can go ahead and shift D before q is loaded:
We have two operations left to perform on this branch for this iteration. Both q and D need to be shifted once more, and then we need to check our condition to see if we stay in the loop. Luckily, we've already defined state #7 to shift D and check the condition. So to finish out the left branch, all we need to do is shift q and jump to state #7:
Now that we've defined all of the states in the computation loop, we have to think about the endgame. If we determine that we are done looping in state #7, we will get to state #11 with our final square root value sitting in A1. Nothing else matters at that point. All we need to do is make sure that F1 contains our final answer and nothing else.
For completeness, I've added an effective NOP state #12 so that we can pulse an eoc signal before going back into the idle state:
We have successfully mapped the algorithm to states and operations that can each be handled by the datapath. In the next post, I'll detail how to get these 13 states all configured in a single datapath.
Rating: Be the first to rate
Mar 10, 2013
If you've used the datapaths to perform operations, you know that they can perform an add or subtract and they can perform various logical operators such as AND, OR, and XOR along with shifting. A couple of the Cypress component development engineers have decided to take the capabilities of the datapaths much further. In a recent post to the Community Components section of the Cypress Forums, Richard Mc Sweeney has provided a component that can perform trigonometric functions using the CORDIC algorithm. Another component developer, CJ Cullen, has provided me with his implementation of a square root function using datapaths. I'll use the next several blog posts to go through CJ's description of the implementation he put together.
The approach we'll use for a square root algorithm is essentially a guess-and-check. To calculate the square root of N, we will continue to make a guess (let's call it root) until that guess is as close to sqrt(N) as possible without going over. At each guess, we will be checking if (root + Delta)2 <= N, where Delta gets progressively smaller until it is beyond the precision of our data type. If (root + Delta)2 is smaller than N, we will add Delta to root to get closer to our final answer. To do this intelligently (and efficiently on hardware), we will use powers of 2: Delta = 2n/2 - 1, 2n/2 - 2,.., 2, 1, where n is the maximum number of bits in any N we want our algorithm to handle. For you computer-sciency types, we are essentially performing a binary search of the solution space of our problem. This gives us a first cut of our algorithm:
Delta = 2n/2 - 1, root = 0
while (Delta > 0)
if ((root + Delta)2 <= N)
root += Delta
Delta >>= 1
This looks okay. We take advantage of the fact that Delta is a power of 2, and a shift-right can get us every value of Delta that we need. However, we still have a multiply at every step, and that is never going to be fast or easy in the PSoC UDBs. We can do better.
Instead of checking against N every step, let's instead keep track of the remainder instead. Let M = N - root2. We can calculate that at the end of each step and hold onto it for the next step. In our first try above, we were checking ((root + Delta)2 <= N). Expanding that out, it is equivalent to checking the condition (root2 + 2*root*Delta + Delta2 <= N), which we can reorganize as (2*root*Delta + Delta2 <= N - root2), finally substituting in M to get (Delta *(2*root + Delta) <= M). There are now two multiplies necessary to compute our condition. Fortunately, one is a multiply by 2, the other is a multiply by a power of 2, so all we really have is shifts, which we can do just fine in the UDBs. This gives us a much more efficient algorithm:
Delta = 2n/2 - 1, root = 0, M = N
while (Delta > 0)
if (Delta *(2*root + Delta) <= M)
M -= Delta*(2*root + Delta)
root += Delta
Delta >>= 1
This looks a lot better. All of the operations can be done efficiently in the datapath. Shifts and adds and subtracts are all one cycle apiece. But, the datapath can only shift one bit at a time. Multiplying Delta by (2*root + Delta) can take up to n/2 shifts. This isn't so bad, but it causes our algorithm to scale quadratically as we increase the number of bits. That still leaves another chance for optimization.
If we keep track of Delta and root, we are going to have to do some sort of multiplication by Delta at every step. Instead, we can build in the "multiply by Delta" into the variables that we track. We know that we will be shifting Delta one bit to the right at every iteration, so we can coordinate that shift with the other variables that we track. Let's define a variable D to track Delta*Delta and a variable q to track 2*root*Delta. Now, our condition has been simplified to (D + q <= M). Our old Delta variable was shifting right one bit every iteration, so our new D variable (Delta*Delta) should shift two bits to the right every iteration. Also, our q variable has a factor of Delta built into it, so we now need to shift that to the right by a bit each time through. With these substitutions, we come up with the following:
D = 2n-2, q = 0, M = N
while (D > 0)
if (D + q <= M)
M -= (D + q)
q += 2*D
D >>= 2
q >>= 1
In the final iteration, when the "D >>= 2" causes D to change from 0x01 to 0x00, it actually represents changing our Delta value from 20 to 2-1. Therefore, q (which we are using to track 2*Delta*root) is now equal to 2*(2-1)*root, our final answer.
We've now reduced our algorithm to a bounded number of shifts and adds for each iteration. In the next post I'll go over how we can get this algorithm executing in the datapath.
Rating: Be the first to rate
Dec 12, 2011
PSoC Creator 2.0 is now on the web with some valuable new components:
While you're working with these new components, we are busy working on the next set of new components. Rather than make you wait until PSoC Creator 2.1 before you see more new components, we have another new feature in Creator 2.0 that allows us to provide you new components much more quickly. The new feature is the capability to provide a Component Pack. With a Component Pack we can deliver brand new components and deliver new versions of existing components. We'll do this without changing PSoC Creator and without requiring a full new installation. You can install a new component pack quickly and then you can choose when to start using these new components and new component versions all without impacting what version of Creator is building your current designs.
We have several new components that are finishing up now that will go into the first Component Pack which will ship early next year.
For those of you that like to dig a little deeper into how things work you may notice that with PSoC Creator 2.0 there is an additional library dependency called CyComponentLibraryUpdates that is included by default. This is the library that will contain the new and updated components that ship in a Component Pack. This library is currently empty. With each component pack this library will add content and then when PSoC Creator 2.1 ships they will all migrate into the base CyComponentLibrary and we'll start building this update library again. The PSoC Creator component catalog merges all the content from these libraries together into a single combined view, so you don't need to think about what library your content is coming from. This is just the mechanism that we've added with PSoC Creator to make these rapid releases of content possible.
Rating: (4.5/5) by 2 users
1 to 5 of 22 Results | Next >
Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms and Conditions of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms and Conditions of this site. Cypress Semiconductor and its suppliers reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.