From list_ob at gmx.net  Fri Dec  3 14:28:55 2010
From: list_ob at gmx.net (Oliver Betz)
Date: Fri, 03 Dec 2010 15:28:55 +0100
Subject: Control deferred writes?
Message-ID: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>

Hallo All,

can I tell gcc not to defer writes, possibly only to certain 
variables?

In the coldfire V2 microcontrollers, consecutive register writes are 
slow. Consecutive writes to "off-platform" registers take 12 cycles 
IIRC.

Less a problem if the register writes are not consecutive IOW other 
instructions are executed between the writes.

Therefore it is detrimental to delay writes to these registers if it 
leads to consecutive writes.

After all, it's extremly hard to debug code where everthing is out of 
order. And no, I sometimes can't switch off optimization for 
debugging.

Oliver
-- 
Oliver Betz, Muenchen


From list-bastian.schick at sciopta.com  Fri Dec  3 14:38:26 2010
From: list-bastian.schick at sciopta.com (42Bastian)
Date: Fri, 03 Dec 2010 15:38:26 +0100
Subject: [coldfire-gnu-discuss] Control deferred writes?
In-Reply-To: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>
References: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>
Message-ID: <4CF900E2.90804@sciopta.com>

Am 03.12.2010 15:28, schrieb Oliver Betz:
> Hallo All,
> 
> can I tell gcc not to defer writes, possibly only to certain 
> variables?

No, not at all. If you need such, write assembly.

The compiler has no idea of the underlying hardware.
It might schedule instructions if it knows the CPU core, but not w.r.t.
bus timing.

-- 
42Bastian
+
| http://www.sciopta.com
| Fastest direct message passing kernel.
| IEC61508 certified.
+


From david at westcontrol.com  Fri Dec  3 15:34:48 2010
From: david at westcontrol.com (David Brown)
Date: Fri, 03 Dec 2010 16:34:48 +0100
Subject: [coldfire-gnu-discuss] Control deferred writes?
In-Reply-To: <4CF900E2.90804@sciopta.com>
References: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net> <4CF900E2.90804@sciopta.com>
Message-ID: <4CF90E18.7040107@westcontrol.com>

On 03/12/2010 15:38, 42Bastian wrote:
> Am 03.12.2010 15:28, schrieb Oliver Betz:
>> Hallo All,
>>
>> can I tell gcc not to defer writes, possibly only to certain
>> variables?
>
> No, not at all. If you need such, write assembly.
>
> The compiler has no idea of the underlying hardware.
> It might schedule instructions if it knows the CPU core, but not w.r.t.
> bus timing.
>

Writes from the Coldfire V2 core are in-order - there is no re-ordering 
write buffer, and the data cache is write-through.  Other Coldfire cores 
may have hardware that affects the ordering or buffering of writes.

The compiler does not know the timing of writes to various places.  Thus 
when scheduling it can only assume that writes all take a fixed number 
of cycles.

Since you don't have to use any cpu-specific instructions to enforce or 
control the ordering of the writes, the only issue is to control the 
compiler-generated write instructions.

There are three tools for doing that.  One is to write at least some 
parts of your code in assembly, as suggested.

Use of "volatile" is important.  All "volatile" writes will be generated 
in the order expected by the program, and you will get no more nor less 
than you ask for.  But note that non-volatile reads and writes can be 
re-ordered freely around the volatile reads and writes.

Remember also that it is possible to enforce volatile writes to 
non-volatile data by a bit of (slightly messy) casting:

     *(volatile int32_t *)(&foo) = 123;


The final tool is the memory block, usually written as:

     asm volatile ("" ::: "memory")

This tells the compiler that any writes to memory need to be completed 
before "excuting" the line (it generates no code by itself), and no data 
read before the line can be trusted after the line (i.e., any data in 
registers must be re-read).

Use volatile accesses, and memory blocks if needed, to enforce the write 
ordering that you require.  Then let the compiler handle the rest as 
best it can.


From list_ob at gmx.net  Fri Dec  3 16:05:48 2010
From: list_ob at gmx.net (Oliver Betz)
Date: Fri, 03 Dec 2010 17:05:48 +0100
Subject: [coldfire-gnu-discuss] Control deferred writes?
In-Reply-To: <4CF900E2.90804@sciopta.com>
References: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>, <4CF900E2.90804@sciopta.com>
Message-ID: <4CF9155C.26577.5C258720@list_ob.gmx.net>

42Bastian wrote:

(I sent a direct reply by accident, now to the list as intended)

> > can I tell gcc not to defer writes, possibly only to certain 
> > variables?
> 
> No, not at all. If you need such, write assembly.
> 
> The compiler has no idea of the underlying hardware.
> It might schedule instructions if it knows the CPU core, but not w.r.t.
> bus timing.

I wasn't asking for such optimizations, but I find many deferred 
writes where I can't see any benefit.

If the variable will be be written at some later time *) anyway, why 
does the compiler delay this write at all, IOW what is the intended 
benefit?

It doesn't require less code, and it doesn't save execution time as 
far as I see.

Oliver

*) e.g. "volatile". gcc of course respects "6.7.3 Type qualifiers" 
(including footnote 114) of ISO/IEC 9899:1999.
-- 
Oliver Betz, Muenchen


From list_ob at gmx.net  Sat Dec  4 12:44:07 2010
From: list_ob at gmx.net (Oliver Betz)
Date: Sat, 04 Dec 2010 13:44:07 +0100
Subject: [coldfire-gnu-discuss] Control deferred writes?
In-Reply-To: <4CF90E18.7040107@westcontrol.com>
References: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>, <4CF900E2.90804@sciopta.com>, <4CF90E18.7040107@westcontrol.com>
Message-ID: <4CFA3797.23328.60934038@list_ob.gmx.net>

David Brown wrote:

[...]

> control the ordering of the writes, the only issue is to control the 
> compiler-generated write instructions.
> 
> There are three tools for doing that.  One is to write at least some 
> parts of your code in assembly, as suggested.

I already squeeze out every cycle by using assembly code where 
necessary, but I prefer the compiler to yield good results.

> Use of "volatile" is important.  All "volatile" writes will be generated 
> in the order expected by the program, and you will get no more nor less 
> than you ask for.  But note that non-volatile reads and writes can be 

The "problem" is the concentration of deferred writes to volatile 
objects (e.g. at sequence points).

Volatile writes shall not be optimized out or reordered, so if there 
is a write to a volatile, why defer it at all?

[...]

> Use volatile accesses, and memory blocks if needed, to enforce the write 
> ordering that you require.  Then let the compiler handle the rest as 
> best it can.

Maybe gcc could do better.

Oliver
-- 
Oliver Betz, Muenchen


From art_bryan at hotmail.com  Sun Dec  5 21:54:59 2010
From: art_bryan at hotmail.com (Arthur Bryan)
Date: Sun, 5 Dec 2010 16:54:59 -0500
Subject: Linker Tutorial for embedded systems
In-Reply-To: <SNT140-w1CE20CB272FD356DCABE291230@phx.gbl>
References: <SNT140-w1CE20CB272FD356DCABE291230@phx.gbl>
Message-ID: <SNT140-w51232690CE5D91D07C8470912A0@phx.gbl>


Linker Tutorial for
embedded systems

 
The purpose of this tutorial is to show how to read the
linker script file and also the GNU linker document.  It does not go into every aspect of the
linker (m68k-elf-ld) but provides enough information to understand the CodeSourcery
linker script file and to use it in embedded baremetal systems.

 
It is necessary to understand the linker script file so as
to ensure that only the code you want in the final executable is there and the
source of that final executable. Knowing what is in the final code ensures that
you have the smallest code necessary for your embedded system.  It also can provide insight as to how best to
use static ram and cache in your system.

 
First some background; when you compile your program into an
executable file, the compiler creates one or many position independent object
files where this PIC code is placed in output sections.  

 
Position Independent Code (PIC)
code means that final address locations have not been resolved. Why you
ask?  On final linking which is what we
are addressing in this tutorial, you can choose to put your code at any
location within memory and so a function you wrote could be at location ox10 or
ox30.  You can decide where using the linker
script or let the linker do it automatically.

 
These output sections become the basis for the input
sections within the linker script. There is also additional code that is added
by way of the linker script file.  These
too have output sections.  This
additional code is used to configure the controller and then set up your
environment to execute your program from the main function.  This additional code comes from archive
object files such as libc.a, libcs3.a, libcs3coldfire.a, libcs3hosted.a,
etc.  I refer to these as archive files
because each file listed is a combination of several files.  To see what is in these files use the
following command.

 
C:\CodeSourcery\Sourcery G++
Lite\bin\m68k-elf-objdump.exe" -h -l -p -s -D  c:\CodeSourcery\Sourcery G++
Lite\m68k-elf\lib\libcs3.a" > c:\temp\objdump.txt

 
This file contains a number of object files.  Below are just a few:

 
start-sim.o

start_c.o

heap.o

demoem-reset.o

 
The linker is used to take your compiled source code and the
above archive files and put them in certain physical locations in the resulting
executable file and resolve function call addresses.  The linker does numerous other things but
these are it purpose.

 
There are a number of things to understand about the GNU linker
document. The first and most important thing to understand is the  ?input sections? , ?output sections? and the ?SECTIONS?
phrases.  In particular the input
sections actually comes from the output sections of your source object code in
which libcs3.a is apart.  Put another
way, when you compile and link your program your compiler first creates one or
more object files sometimes denoted with a ?.o? extension.  In these object files your code is broken
down into sections with names that are arbitrary but in most cases they are
.text. .data, .bss, etc.  Knowing the output
section names you can then use them to map to the output sections of your final
executable.  The period before the .text
is just part of the name and bears no other significance.

 
Now the resulting executable file is in some sort of
format.  It may be in ELF, S19, PE or
another format.  We will only be
considering ELF in our tutorial.  And so
the SECTIONS keyword defines the layout of the final executable.  Below shows the syntax for the SECTIONS
phrase.  Think of the contents of linker
file as being the skeleton layout similar to laying out a book.  Think of the SECTIONS as the body of the
document.

 
SECTIONS

{

} /*End of sections*/

 
There
is only one SECTIONS phrase
in a linker file and within the curly brackets we then define the output
sections that go into the exactable. 

 
SECTIONS

{

 
     ".text"
:

     {

     }

 
.data :

     {

}

.bss.eh_frame :

     {

}

.eh_frame_hdr :

{

}

.rodata :

{

}

 
.cs3.rambar :

{

}

 
} /*End of sections*/

 
Within
the SECTIONS phrase you
have ".text", .data, .bss.eh_frame,
.eh_frame_hdr, .rodata, .rodata.  These
are just some ?output sections?
that are being defined for the resulting executable you are creating.  You will notice that ".text" has quotes and the others do not, I
added the quotes to demonstrate that it is just a name and the period before it
has no special meaning.  The other output
sections could also be quoted.  What is
unique is that these names are arbitrary and you could call them ?dog? and
?cat?.  The reason why you will see so
many examples with .text and .data is
that it is a holdover from the a.out binary format.

 
Now
each output section can have one or more input sections from the source object
files you have compiled and the added archive files.  Remember when you compiled your main.c file,
the linker also added some archive files at link time to get your application
to work on your microcontroller.  Think
of the input sections as the paragraphs to each chapter where the chapter is
the output sections e.g. ".text"
and .data.  You take paragraphs of information from your
research which in this case is your source object files and the added archive files.  The ?*(.text)?
below is an input section that states ?if any of my source object files and
archive files has a .text section place the code within this section
here.?  The asterisk (*) means any.

 
".text" :

     {

*(.text)
/*line 1*/

*(.text1
.rdata) /*line 2*/

*(.text2)
*(.rdata) /*line 3*/

Foo.o (.text3)
/*line 4*/

     }

 
Line 2 from above states to put .text1 and .rdata input
section code here randomly.  Line 3
states put .text2 and then .rdata input section code here.  Line 3 could have put the .rdata input
section on the next line down. And finally line 4 states put .text3 input
section code from the Foo.o file here. 
Also because each line is sequential, code will be added
sequentially.  

 
Let?s turn to where we want to put our code in memory.  We have several methods of placing our ".text" output section and the other output sections in memory but I will only demonstrate
one.  First we will define our memory
regions in our microcontroller.  This is
done with the below phrase.

 
MEMORY

{

  rom (rx) : ORIGIN = 0x00000000, LENGTH = 2M

  ram (rwx) : ORIGIN = 0x40000000, LENGTH = 16M

  rambar (rwx) : ORIGIN = 0x80000000, LENGTH =
16K

}

 
We have given each memory range a name: rom, ram and
rambar.  So now we will tell the linker
to put ".text" in a
particular location with the ?>rom?
phrase.

 
".text" :

     {

/*input
sections here*/

     }
>rom

 
This
will place the ".text" code
in rom starting at 0x00000000

 
Note that there is only one MEMORY phrase and it comes
before the SECTIONS phrase.

 
I
have also seen this syntax  ?>rambar
AT>rom? in the codesourcy linker files but am unaware of how this works.
RESEARCH NEEDED.

 
".text" :

     {

/*input
sections here*/

     } >rambar
AT>rom

 
Above
I have shown that I can place my input sections sequentially in my output
section however each input section may not start at the most optimum location
so I can adjust them individually by doing any of the following:

 
. = ALIGN (8);

_bdata = (. + 3) & ~ 3;

variable = ALIGN(0x8000);

 
I suggest only using the first statement since it adjusts
the location counter to a location where the next input section listed after it
will start.  

 
We now turn to the ?.? which is called the location
counter.  Think of this as a program
counter whose address increments as input sections are added sequentially to
each output section.  You may see the
location counter manipulated as above to change its address or get its address.  The ld.pdf demonstrates why you would want
its address.

 
Note. Do not confuse this dot ?.? with those added to the
above variable names.  This is just a
coincidence.  The dot has to stand alone
with white space before and after it.

 
Let?s turn to reading a linker script from CodeSourcery.  I will annotate each phrase just below it.

 
OUTPUT_ARCH(m68k)

/*This tells the linker what kind of controller it is
creating the final executable for*/

 
ENTRY(__cs3_reset)

/*This identifies the first executed location.  An object dump of a simple application
reveals that the second address holds the address of this location. I am using
the m5208evb-rom.ld linker file and building against the coldfire 5208*/

 
SEARCH_DIR(.)

/*It tells the linker to search the current location of the
linker file for the libraries below*/

 
GROUP(-lgcc -lc
-lcs3 -lcs3unhosted -lcs3coldfire)

/*This state that all the above archive files have to be
included in the linking and ALL PARTS RESOLVED or an error will result.  ?lgcc is the libgcc.a and ?lc is libc.a
.  Look in location ?C:\CodeSourcery\Sourcery
G++ Lite\m68k-elf\lib? for these files*/

 
MEMORY

{

  rom (rx) : ORIGIN = 0x00000000, LENGTH = 2M

  ram (rwx) : ORIGIN = 0x40000000, LENGTH = 16M

  rambar (rwx) : ORIGIN = 0x80000000, LENGTH =
16K

}

/*We explained the MEMORY phrase before. The (rx) tells the
linker what can be done to these regions. 
Rwx is read, write, execute */

 
EXTERN(__cs3_reset
__cs3_reset_m5208evb)

EXTERN(__cs3_start_asm
_start)

/* Bring in the
interrupt routines & vector */

INCLUDE
coldfire-names.inc

EXTERN(__cs3_interrupt_vector_coldfire)

EXTERN(__cs3_start_c
main __cs3_stack __cs3_heap_end)

 
/* Provide fall-back
values */

PROVIDE(__cs3_heap_start
= _end);

PROVIDE(__cs3_heap_end
= __cs3_region_start_ram + __cs3_region_size_ram);

PROVIDE(__cs3_region_num
= (__cs3_regions_end - __cs3_regions) / 20);

PROVIDE(__cs3_stack
= __cs3_region_start_ram + __cs3_region_size_ram);

/* These force the
linker to search for particular symbols from

 * the start of the link process and thus
ensure the user's

 * overrides are picked up

 */

 
SECTIONS

{

  ".text" :

  {

    CREATE_OBJECT_SYMBOLS

/*Currently unaware of its intended purpose.  I believe it is for debugging with the map
file using the map file.  More info can
be found here. http://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_20.html

*/

    __cs3_region_start_rom = .;

    _ftext = .;

/*Think of these as linker variables to the location where
they are listed.  Although they are
listed sequentially they have exactly the same address.  You can use them as location pointers into
your final compiled program.  Because the
".text" output section
is in rom and rom starts at 0x00000000 then
both these variables has an address of 0x00000000

*/

    *(.cs3.region-head.rom)

/*This states to place all code from my input object files
(i.e. my archives and compiled application code) that lie within their .cs3.region-head.rom output section if any, in this area.  This output section refers to those within
the archive file and my compile code only, and not within the output sections I
am now building.*/

 
    ASSERT (. == __cs3_region_start_rom,
".cs3.region-head.rom not permitted");

/*This is used to throw an error.  Here we see that if ?. ==
__cs3_region_start_rom? i.e. nothing was added then throw an error.  You will notice that  __cs3_region_start_rom  held the location before the .cs3.region-head.rom and was evaluated
against the dot ?.? after the region input section ends. A more readable
version could be:

 
?

__start-region-head
= .;

*(.cs3.region-head.rom)

__end-region-head =
.;

ASSERT
(__start-region-head == __end-region-head, ".cs3.region-head.rom is
empty");

?

  */end of example.

 
__cs3_interrupt_vector
= __cs3_interrupt_vector_coldfire;

*(.cs3.interrupt_vector)

/* Make sure we
pulled in an interrupt vector.  */

ASSERT (. !=
__cs3_interrupt_vector_coldfire, "No interrupt vector");

/*This throws an error also. */

  
PROVIDE(__cs3_reset
= __cs3_reset_m5208evb);

/* The PROVIDE phrase basically states that if your program
defines it e.g (int __cs3_reset =20;)
the program version will be used.  If
your program defines __cs3_reset as
external and only references it then the statement within the brackets ?()? is true.

*/

 
    *(.cs3.reset)

/*Again this means place all the code within your .cs3.reset  source output sections of your application and
included archive files here */

 
    _start = DEFINED(__cs3_start_asm) ?
__cs3_start_asm : _start;

 
/*Looks familiar? It should it?s a c if statement syntax.
This however is not c code being compiled. 
It basically states that if __cs3_start_asm
is in the linker
global symbol table (ld.pgf, pg 73) then preserve it else use _start */

 
    *(.text.cs3.init)

    *(.text .text.* .gnu.linkonce.t.*)

/*We have seen these before. 
The .text.* uses the wildcard ?*? to say put anything that resembles it
here. Also because it is inside the parenthesis it is intermixed with the other
source output sections as opposed to being added sequentially as would be
assumed.

*/

   
 . = ALIGN(0x4);

/*We have also seen this before.  This tells the location counter to go to the
next 4 byte boundary. We do this to ensure that if the previous input sections
did not end on a 4 byte boundary that the linker will not fill it new input
sections. Why do this?  Some processors
move data into cache in 4, 8, 16 and even 64 byte chunks. Because of this the
processor may have to access memory several times to move your code and thus
increase the processing time and slowing your speed. Some processors I believe
will even stop processing or trash to process your code.  There is a phrase SUBALIGN (Ld.Pdf, Paragraph
3.6.8.4)
in the linker document that allows you to even breakup your input section code
and place them on boundaries.*/

 
    KEEP (*crtbegin.o(.jcr))

/*If ?--gc-sections? is used then the
linker will throw away sections if understands cannot be executed.  The KEEP phrase tells the linker to leave
this input section in place*/

 
    KEEP (*(EXCLUDE_FILE (*crtend.o) .jcr))

/*The EXCLUDE_FILE
(*crtend.o) tells the linker not to include any .jcr section from the crtend.o file */

 
    KEEP (*crtend.o(.jcr))

/*Now here it added the .jcr sections from the crtend.o file */

 
    . = ALIGN(0x4);

    *(.gcc_except_table .gcc_except_table.*)

  } >rom

 
I have not covered using linker symbols from your
application code.  You can think of
linker symbols as address references to locations within the final linked
application. These can be used to identify and move areas of code and data from
one location to another or access this data placed directly by the linker
instead of your program.  Section 3.5.4
Source Code Reference of the ld.pdf gives an example

 
I have also not covered how to replace those predefined libraries with
your own.

 
Summary.

 
When you create your final executable program it goes through several
steps.  First your source code is
compiled into position independent object code that has output sections.  Next, these output sections become the basis
for input sections of your linker script. 
Your linker script then places these various output sections now input
sections into output sections for your final executable.  Since we are dealing with the elf format and
using similar section names then talking about output and input sections
sometimes become ambiguous. 

 
The linker uses dot ?.? location pointer to identify and resolve
addresses within the final executable. What was once PIC code in now position
dependent but know because it was defined by the linker script. We can also
manipulate the dot location pointer to move to various locations to place our
code.

 
Please do hesitate to comment or add to this tutorial.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/coldfire-gnu-discuss/attachments/20101205/cc566002/attachment.html>

From tom_usenet at optusnet.com.au  Tue Dec  7 04:04:18 2010
From: tom_usenet at optusnet.com.au (Tom Evans)
Date: Tue, 07 Dec 2010 15:04:18 +1100
Subject: [coldfire-gnu-discuss] Control deferred writes?
In-Reply-To: <4CFA3797.23328.60934038@list_ob.gmx.net>
References: <4CF8FEA7.23858.5BCCD3BB@list_ob.gmx.net>, <4CF900E2.90804@sciopta.com>, <4CF90E18.7040107@westcontrol.com> <4CFA3797.23328.60934038@list_ob.gmx.net>
Message-ID: <4CFDB242.3020802@optusnet.com.au>

Oliver Betz wrote:
> Volatile writes shall not be optimized out or reordered,

As long as you don't run into these:

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

Tom