You are here

PSOC 4 inline assembly ARM or Thumb? | Cypress Semiconductor

PSOC 4 inline assembly ARM or Thumb?

Summary: 17 Replies, Latest post by paul.chambre_1977636 on 20 Nov 2016 10:28 PM PST
Verified Answers: 1
Last post
Log in to post new comments.
paul.chambre_1977636's picture
User
45 posts

Hi folks,

I'm trying to embed some inline assembly in my project to work around areas where I can't get the compiler and optimizer to do what I want.

I pulled the lst file for the main loop, and I can see a couple of places that I want to tweak, to save a few precious clock cycles. What has me confused is that the lst formatting looks like ARM assembly, but AN89610 (Code Optimization document for PSOC 4) says that PSOC 4 uses Thumb 2.

Is there a way to choose ARM or Thumb?

I modified the main loop portion of the lst code, and swapped it in within asm(" .... "); The result is a "Cannot represent THUMB_OFFSET relocation in this object file" error during build. There's a referenced .s file and line number, but I can't find the actual file. I'm guessing it's just temporary during the build process....? 

At this point, I'm getting almost the performance I need from my solution in C, but I have tried many variants, and only been able to find options that don't work. I do think I need the extra savings I can get from using assembly in the critical sections.

Thanks for the help,

Edit: I think I see the cause of the error: I missed that an array I'm using is represented as a label within the assembly block. I'll need to fix the assembly to correct address the array. I'm still rather confused about the ARM vs. Thumb stuff, though.

 

Paul

paul.chambre_1977636's picture
User
45 posts

OK. Here's what I'm seeing now.

The label I had overlooked (.L61 in my project) is the base address for the addresses of the port status and data registers. Some of these are pre-loaded into registers before the block of code that I'd like to modify, but a couple of them are read "on the fly" (.L61+16 or .L61+20).

Any recommendations on accessing the port registers reliably through inline assembly? I would think even the pre-loaded ones should be something I'm not counting on.

user_365962704's picture
User
163 posts

I think in CortexM (ARMv7) devices you can only go for Thumb instruction set (ARM was used in older versions).

Labels like .L61 seems to be literal pools, and i think using literal pools are the best way to access registers, anyway, you can not load 32bit immediate values in ARM asm, so you have to load it "by pieces", first the lower half and then the higher half, that's what the ldr instruction do under the hood (it´s a macro, not an instruction).

Here are some useful links:

https://community.arm.com/docs/DOC-7869

http://www.ethernut.de/en/documents/arm-inline-asm.html

 

i'm trying to learn inline asm too, remember to use asm volatile (); this way the compiler will respect your inline asm.

Hope it helps

paul.chambre_1977636's picture
User
45 posts

Thanks.

I started with the ethernut cookbook before posting here. It seemed helpful, but is fairly contradictory versus the inline assembly comments in http://www.cypress.com/file/46521/download.

According to the Cypress documentation, something like this:

            "ldr r4, =CYREG_PRT2_PS\n"    //read address high
            "ldr r3, [r4]\n"

Should work, but, in fact, causes a no-information compiler failure. That's the piece I'm trying to figure out at the moment.

paul.chambre_1977636's picture
User
45 posts

Then again, I'm not too sure about the Cypress doc. I tried this example from it (page 15), won't even compile, let alone build:

    int foo = 5L;
    int bar;
 
    bar = foo + 1;
 
    /* bar = foo + 1 */   
    asm("LDR r0, =foo\n"        
        "LDR r1, =bar\n"        
        "LDR r2, [r0]\n"        
        "ADD r2, r2 #1\n"        
        "STR r2, [r1]"); 

Removing the extra(?) r2 in the ADD line allows it to pass the first pass of the compiler, but still fails with the same general error as I was getting from my code.
 

paul.chambre_1977636's picture
User
45 posts

This almost works:

        asm(
            "ldr r3, [%[datawrite]]\n"  //Get address of data write function
            "mov    r2, #125\n"
            "str r2, [r3]\n"  //write data bus from r2
            :
            :   [addhigh] "l" (Pin_Address_High_PS),
                [addlow] "l" (Pin_Address_Low_PS),
                [datawrite] "l" (Pin_Data_DR),
                [dataread] "l" (Pin_Data_PS),
                [pinrw] "l" (CYREG_PRT0_PS),
                [buffer] "l" (buffer)
            );
 

It compiles, and builds, and programs, but it doesn't actually set the 125 value on the data bus. However, the resultant lst output does look pretty similar, so I think I'm close to getting this right.

On the other hand, one difference I observed between what's in the lst files and what will compile is f after the label of a forward jump. The generated assembly in the lst files had this, but it caused a compile failure when I used it in my code.

paul.chambre_1977636's picture
User
45 posts

OK. This is not making much sense. I tried really simplifying the C code, to compare the lst output to the lst output from my really simplified inline assembly. To my eye, the resultant lst content looks logically equivalent, but the C works, and the inline assembly does not. By "works", I mean that the C writes 125 to the register associated with Pins_Data, and the assembly leaves that as all high values (255).

The two results (source and lst) are attached (because of the Cypress spam filter that seems to fire whenever too much example code is embedded in a post).
 

paul.chambre_1977636's picture
User
45 posts

I got the really simple case working: writing a byte to a pin set:

         asm(
            "mov r3, %[datawrite]\n"  //Get address of data write function
            "mov    r2, #125\n"
            "str r2, [r3]\n"  //write data bus from r2
            :
            : [datawrite] "l" (&Pin_Data_DR)
            );

The optimizer still messes it up a bit, but it works. Now just to figure out all the other issues....
 

user_1377889's picture
User
9583 posts

Well, Paul, I can assure you that it is a challenge to be better than the GCC optimizer! Did you try to set (you can do that on a .c file basis)  the optimization level to "speed" or "size"? There are even other settings to try mentioned in the GNU compiler manual.

 

Bob

paul.chambre_1977636's picture
User
45 posts

Yes. It's built using speed optimization. I also had to add some noinline optimizer hints to keep the optimizer from really messing up parts of it.

The inline assembly is to deal with areas where an if/else would be more efficient than the current code, but the optimize won't accept it. The optimizer is also doing things like checking an if condition both at the top and bottom of a block of code,

user_342122993's picture
User
579 posts

Paul, it seems that you are trying to read/set pins values in a fast manner. To do that ASM is not required. As I can remember, a C code allows 1 tick pin read and 2 tick pin set - and that is as fast as it can be done. You can find examples by searching 'toggle pin fast'. I have PSoC5 example somewhere (which does not apply to PSoC4), but someone else can help you with example if you 'ask a right question'.

Log in to post new comments.