In the previous post, we came up with the 13 states necessary to calculate the square root. There's no limit on the number of states we can have, but the number of unique operations is limited to 8. Currently we have 10, so we'll need a few tricks to make this work.

Notice that I haven't listed the "(load F0/F1)" as part of the operations. FIFO loading is controlled directly from our state machine, independent of the operation.
We need to trim out two operations.
First, notice that in state 11 (op 9), we only care about the "A1 = F1" part. It doesn t matter at all what happens to A0, so we can have state 11 reuse op 8.
Second, notice that op 5 and op 7 are similar, except that op 7 also loads F0 into A0. State 5 currently starts with an empty F0, but we can have state 4 load F0 and then state 5 can copy it back. State 5 can then reuse op 7, so long as we also modify state 4 to load A0 into F0.
We now have the following operations:
And the following states:
Or in its state diagram form:
The next trick is in mapping the states to the operations using Verilog. We have 13 states, so we'll need 4 bits to encode them. Conveniently, no operation is used in more than two states. This allows us to use the bottom three bits to encode the operations we defined above, while using the top bit to distinguish between the two states that use that operation. With this mapping no additional resources are needed to map from states to operations.
It's usually good practice to give meaningful names to states, but since I've been using numbers above, I'll continue to name them with numbers here:
I've noted in the comments which states we want to trigger a FIFO load. We use this to define wires that will be used as control signals to the datapath:
We're almost done at this point. The next blog post will conclude the implementation of the square root component.