Monday, 8 August 2011

Mapping the Absolute Addresses of Registers from a C Header

Why am I doing this, you may ask? Doesn't the microcontroller manufacturer provide the absolute addresses of the registers? Well in this case, no they don't, and I need them to complete the plugin for the Keil µVision debugger which will make them visible and editable via human readable names.

There are a number of ways of mapping hardware into C but due to the limitations of the const keyword [did you know you can cast the const away? add it to the list of things in C like = actually being assign, declare a pointer with * and then deference it's content with *, etc., etc.] most microcontroller manufacturers end up using #define statements as they can be made to result in fixed numeric constants that can then be optimized correctly in all situations.

[For a quick counter example create a function that accesses a global const pointer, call it and then look at the ASM. Does that look optimal to you? But don't blame the compiler - it can't assume you haven't messed with the const pointer with casts! A good discussion is here.]

Anyway the normal way is to begin with several layers of #defines. The reason is for maintainability and portability. Typically the registers for a microcontroller peripheral are created at a fixed offset from a base and typically they will want to use the same peripheral in several micros. So this is how they arrange things for the USB module in the Fujitsu MB9BF506R, an ARM Cortex M3 based microcontroller:

#define FM3_PERIPH_BASE    (0x40000000UL)


#define FM3_USB0_BASE      (FM3_PERIPH_BASE + 0x42100UL)

So the USB peripheral is actually to be found starting at address 0x40042100UL. However the actual device registers are laid out by a C struct. The struct creates the offsets for the individual registers internally and maps them to the human readable names.

Structs have several rules imposed on them by the C standard and also have a few gotchas so please treat these header files with great care when used with a different compiler UNTIL you have verified it. The rules are:
  1. The named elements of a struct will be in the same order as declared in C .
  2. Suitable padding may be included between elements to speed access to the data.
For more information refer to this article:
    [If you don't want suitable padding to be added then consider using the pack attribute, e.g. in gcc #pragma pack. This can have terrible performance consequences so as always, know what you are trying to achieve.]

    Here is a fragment of C from the mb9bf506r.h header file which creates the C symbols for the appropriate registers:

    typedef struct
      union {
        union {
          __IO uint16_t HCNT;
          stc_usb_hcnt_field_t HCNT_f;
        struct {
          union {
            __IO  uint8_t HCNT0;
            stc_usb_hcnt0_field_t HCNT0_f;
          union {
            __IO  uint8_t HCNT1;
            stc_usb_hcnt1_field_t HCNT1_f;

    So following the rules of struct's in C we have the first element being an unsigned, 16 bit long register called HCNT at the first address of this struct, 0x40042100UL. Note the use of unions to allow the same address in memory to be accessed several different ways. In this case it is enabling HCNT to be accessed as a 16 bit wide register and as two separate 8 bit ones called HCNT0 and HCNT1.

    Lets look at the same piece but with the next register's definition included:

      union {
        union {
          __IO uint16_t HCNT;
          stc_usb_hcnt_field_t HCNT_f;
        struct {
          union {
            __IO  uint8_t HCNT0;
            stc_usb_hcnt0_field_t HCNT0_f;
          union {
            __IO  uint8_t HCNT1;
            stc_usb_hcnt1_field_t HCNT1_f;
            uint8_t RESERVED0[2];
      union {
        __IO  uint8_t HIRQ;
        stc_usb_hirq_field_t HIRQ_f;
      union {
        __IO  uint8_t HERR;
        stc_usb_herr_field_t HERR_f;
            uint8_t RESERVED1[2];

    The line of real interest is uint8_t RESERVED0[2]; between the two 16 bit registers. ARM devices' memory map is byte addressing, i.e. each byte attracts a unique address number. So the 16 bit variable is sitting across two 8 bit numbers (which the union allows individual access to). However the main memory of an ARM system is typically organized on 32 bit words, i.e. two 16 bit words and four 8 bit words. The uint8_t RESERVED0[2]; array of two 8 bit words 'pushes' the next register (the 8 bit HIRQ) to the start of the next 32 bit word. This means that the HCNT is at 0x40042100UL while the HIRQ is at 0x40042104UL, 4 bytes offset from the USB peripheral base.

    Sunday, 24 July 2011

    The External Memory Interface as an Educational Tool

    In a recent email conversation with an engineer I ended up discussing the advantages of an external memory interface (EMI) over general purpose I/O (GPIO) on a microcontroller for the purposes of education. This short piece comes from that discussion.

    The question posed was this:
    "What advantages does an external memory interface have over driving a peripheral with GPIO?"
    ... which immediately brings out the question, what do I mean by an external memory interface and what do I mean by GPIO?

    General Purpose Input Output (GPIO)
    This typically refers to a pin of the chip which is not necessary for any of the operations of the chip but is available to a system designer using the chip. Put another way if none of the GPIO are used then the chip will function just fine. The pins are under software control and can be turned into either inputs allowing data to be read from the outside world or outputs pushing data to the outside world.

    External Memory Interface
    In modern digital computer architecture the standard method of connecting a processor core to the outside world is with a memory interface. This is typically formed from an address bus, a data bus (either bi-directional or two uni-directional ones) and some control lines to indicate whether data is being written or read.

    Normally this is present inside the chip but forever hidden from the designer inside and only a few designs allow it to be exposed. Those that do allow extra RAM or ROM memory to be added to the fixed amount of a normal microcontroller design.

    Engineering Advantages of EMI

    Direct memory mapped busses has a couple of advantages over GPIO so, as usual, it depends what you are doing as to whether you care (typical engineers comment!). Here are the two main differences:
    1. Native machine code access. Grants speed, reduced code size and I would guess lower power? No extra instructions to drive the GPIO and turn the bus around to slow you down. Simply create pointers and data structures as usual and prod the linker to place them in the external chips. Or use explicit addresses.
    2. Code execution from external memory. Related to the above.
    1. is only an issue when speed is at stake, really. Code size isn't that big a problem most of the time and the software wrapper around GPIO won't be very large. Certain ADC's and DAC's (usually video or software radio) pretend to be SRAM for extremely quick access and would really require this method of access to function effectively.

    2. only matters if you have run out of on-chip non-volatile memory and have to move sections of your program off chip.

    [Just speculating re. 2. code execution: you could change the compiler to emit instructions to drive a GPIO for data memory accesses in a certain range. However I can't work out how to do a native instruction fetch from the external memory without a little code in internal flash copying it to an area of RAM first. Perhaps a block copy strategy held in flash as a bootloader type thing could minimize the pain of this if it was invoked as appropriate by a compiler or linker? Note that in the world of computers I have discovered that someone has always built an implementation of just about every whacky idea. If you know of a system that uses this paradigm then please let me know. PC's don't count.]

    In a final summary if you are accessing a data storage area or a peripheral occasionally and don't care about 1. or 2. then a simple software wrapper around GPIO would do just fine and be relatively efficient. I would guess 3 to 5 times slower?

    Educational Advantages of EMI

    From an educators point of view when teaching computer architecture and interfacing and EMI is gold in terms of hammering what is happening inside all general purpose computers into the heads of students. I mean simply this basic idea:

    "A CPU issues an address onto the bus with control signals and writes or reads data."

    This very basic idea is surprisingly poorly understood by many students as it is completely hidden from them inside micros or PCs. In many EE or even CompE courses the students never actually physically observe it in action or build their own peripheral and code software to drive it. As long as it remains a mental exercise a substantial portion of the students will not really understand it and without understanding it their ability to understand computers and software suffers.

    Myself and colleagues (past and present) have gone to great lengths to try to give them some practical experience at it - I first encountered a softcore PIC in an FPGA used to expose its pipeline and memory and databusses for the students to observe and extend. We moved to an Atmel part (an ATmega128) with a real external memory interface instead when it became available. Multiplexed address + data but designed so it could drive an 74 series 8 bit latch to generate the full bus. Much improved from our softcore PIC! We would set them the task of mapping an 8k SRAM and an 8 bit input and 8 bit latched output into the memory map of the microcontroller.

    Picture of a FM3 MB9BF506R Microcontroller. Property of Fujitsu Global.I am looking forward to developing similar exercises with a more advanced external memory interface (called the EBI, external bus interface) found on Fujitsu FM3 ARM Cortex M3 based microcontrollers. It has the following neat features, and is really designed to drive FLASH memories transparently which is no mean feat:
    • It does have an external wait input which could prove very useful
      for custom peripherals.
    • It has a basic SRAM mode with configurable timing for read and write strobes.
    • It can transparently split and rebuild data from 32 bit to bus widths 8 bits (or 16)
      on the fly by issuing the correct sequential addresses.
    • It can natively drive various types of FLASH memory including NAND block erase and NOR page reads etc. while pretending to be normal memory to the software.

    Tuesday, 12 July 2011

    Supporting a Microcontroller Course with Hardware

    If you have come to this article for a prescription then I am afraid you are in the wrong place. I have structured this piece as a list of questions you need to answer to obtain the microcontroller that really suits your wants and needs. I have included the answers that work for me at the end to show how I have answered my needs.

    Teaching microcontrollers always starts with the architecture. Sadly this is hardly ever a process that starts with a clean sheet. Each university or company typically has historical investments in a few preferred architectures and sticks to them like glue. Opportunities to change existing courses should always be carefully weighed for the costs and benefits.

    ARM LogoIf you are starting from a clean sheet, or reconsidering existing teaching let me urge you to consider the ARM architecture for one incredibly good reason: the prediction that one ARM processor will be manufactured per person per year by 2014 (EE Times). There are already billions out there in the world. If you are not teaching ARM consider very carefully whether you are doing the right thing by your students by leaving this very valuable training out of their studies. If you want other reasons consider you can get ARM powered chips for the same prices as 8 and 16 bit microcontrollers with comparable peripherals that have 32 bit datapaths and high clock speeds from many manufacturers! Access to development kits mounting ARM Cortex-M devices with programmers/debuggers can cost as little as $12.

    [It should be no secret that I am an admirer of ARM cores and have been working with ARM for some time now]

    What are the course choices?

    Excluding budget constraints and existing equipment and if the number of hours in the course is fixed then we can classify courses into three types. These types are based on where the weight of the learning outcomes are placed as well as the existence of suitable hardware and software support. The three categories are:
    • "Bare metal" (bare chips and a programmer/debugger) - This is one of the best ways of achieving learning outcomes that include basic hardware requirements of microcontrollers i.e. clocks, resets, capacitive decoupling, etc. Coding will consider the booting of the microcontroller as well as code to support any peripherals.
    • Interfacing (a PCB with a few LEDs and switches, probably a clock crystal) - Learning outcomes mainly consist of building interfaces to other peripherals and hardware and advanced coding to support the peripherals.
    • Embedded Software (a loaded PCB where every interface or IO connected to an appropriate demonstration peripheral) - This emphasizes coding, probably including a suitable RTOS or algorithms development for embedded systems.
    No matter what level of course you decide to support there is one basic point that I think needs to be addressed right now:
    On Chip Debugging. Can the actual ASM code and data in RAM be observed while the microprocessor is executing software?
    It can be very easy to end up with hardware that will not support any visibility into the microprocessor/microcontroller as its executes software. In my opinion this damages and interferes with your student's ability to understand and experiment with code inside the microcontroller. In my opinion on chip debugging is a basic pedagogic requirement for any course, not a "value added extra" [NB It is possible to design courses around this limitation but, really, in the 21st century why should you have to? And why should your students be limited in this way?]

    Practical considerations when choosing a course

    The main point when considering the three main levels of microcontroller courses you are thinking of teaching is whether you have any electronic laboratory capability, i.e. lab equipment (PC, oscilloscope, function generator, power supply and parts) and trained teaching and support staff or not.
    If you have a lab then you can look at implementing any of the levels. If not then go for Embedded Software straight away or establish the lab later. "Bare Metal" and Interfacing are not for you!
    As a practical point for "Bare Metal" courses if chips are not available or convertible to a DIP package choose another architecture to teach.
    If you have the lab then the investment in development kits may be a factor. The fully featured kits necessary for Embedded Software are the most expensive, and the least effective at supporting the Interfacing course. At best a few GPIO are available for your students to play with.
    Homebrew hardware/Custom hardware or commercial? This is always a tricky one and best left to your judgment. A few points to consider are:
    • Have you used the micro before? If not a commercial kit may help, at least for the first few years.
    • Do you really need a custom system? If you are doing something specialist or have an existing investment in expansion hardware then a custom system can be a part of a really interesting and challenging course.
    • Custom hardware needs designing and then supporting i.e. repairing, updating, etc. and often proves much more expensive over time than commercial kit.
    Which architecture should be taught?

    Bear in mind the three "levels" of course I have just discussed then we can seriously look at which architecture we should support from two points of view: Pedagogic (teaching) and Practical.

    • Are the functions of the CPU core easy to separate into a simple subset? All real cores tend to have advanced features and blend certain activities due to the need for speed but for training purposes can you keep them separate?
    • Memory access. This is pretty important and in my opinion it should be a single memory space not paged and the ability to directly execute on data in the memory should be limited - i.e. register math
    • Is the architecture RISC or CISC? Teaching any CISC architecture is typically only sensible from a software programmers point of view and even then it doesn't lend itself to a structured course. I would firmly suggest that a RISC form the basis of your courses
    Basically I am a RISC fan from a teaching point of view. As I have to teach fundamentals of hardware structure and design as well as software starting from a RISC is a huge help to my students. [As a cheeky point most CISC designs these days suggest a limited subset of functionality be used for speed so teaching that subset is a great idea.]

    • Confidence/Experience - If you have had success with a design or device then a new device represents a big unknown in terms of software and hardware. Remember those unexpected Errata?
    • Documentation - Typically for a university a huge amount of existing documentation and notes that have built up around the architecture. Think of all the tutorials and lectures that need rewriting!
    • Staff training - Your lab helpers have to know how things work, i.e. does the debugger connect every time? Do you have to reboot the PC if the thing hangs? Does the software have any quirks? (perhaps what quirks does the software have)
    These reasons are also very true for companies but I would suggest that the documentation point would be replaced with code libraries when a commercial client is involved.

    What hardware?

    Here are the factors that I think are most important in choosing a development kit. This is where ARM technologies really score - compared to proprietary architectures there is a real diversity of choice of chip manufacturer with about every possible peripheral included:
    • Cost - labs full of development kit adds up pretty pricey
    • Programmer/debugger cost - see above. Don't forget that many programmer/debuggers are NOT bundled with the development boards. Also if you want your own students to have their own personal boards they are going to need programmer/debuggers.
    • Software - how easy is it to access the compilers and debuggers
    • Robustness - students are pretty hard on kit. How much work are you going to have to do to harden the boards electronically and physically?
    • On chip peripherals - if you are going to teach standard peripherals like USARTs or I2C then make sure they are included in your micro

    What software?

    We then need to consider the programming environment. Again ARM has a real breadth of choice of IDEs and compilers for ARM based microcontrollers. The following points should be born in mind:
    • Is there a free version for students, and if so how limited is it?
    • Is the full version terribly expensive and under what conditions can it be accessed? Hardware companies have an easier choice when software than purely software companies as for them the software is not their core business.
    • How easy is it to support? There are plenty of development IDEs that require Administrator access to run or access the debugger and are often very unstable or full of bugs*.
    [*IDEs with bugs can be a problem for a professional engineer but for students this can be a showstopper. It is often hard enough for them to grasp the correct operation of an IDE/micro/debugger system let alone diagnose a faulty one. Also imagine a whole lab full of machines with full admin access in the hands of students? Note I am painfully experienced in getting limited admin access for certain programs, but really, why would I want to?]

    My choices

    [These choices are influenced by my relationship with ARM but they may help you work through your own choices. It is important to declare interests.]

    The two courses I am planning to support are two Interfacing courses. I also have an eye on project work which involves microcontrollers. Both courses are established and apply different requirements on the hardware that is going to be used.

    What architecture?

    Looking at ARM architectures for what I wish to achieve it is clear that we should be looking at the current generation of cores, i.e. the Cortex M or A series. The M stands for microcontroller or mixed signal and A for application. From an architectural point of view the simpler M series are clearly preferable. We are looking at Interfacing and the available M series parts are much more suited to that goal in terms of their peripherals. For Embedded Software high end M parts or A parts would be just fine. For bare metal work there are some small pin count M parts which can be adapted to a DIP pinouts but it is more challenging to use ARM for that type of course.

    IDE, compiler & debugger

    I am looking at using the uVision IDE from Keil (owned by ARM) because:
    • It is compatible with a very wide range of devices from many manufacturers so you are not tied to any one single company. This will allow a lot of reuse of notes if the target microcontroller is withdrawn or updated
    • It has compilers, assemblers, and on chip debugging facilities
    • There is the essential free version for anyone with code size limits which are well under anything you are likely to need - 32kb of code max (MDK-ARM Lite). [ARM typically have always supported universities strongly so a donation of the full version for internal use in teaching and research may be very possible - talk to them]
    • Keil tools are also pretty well behaved as windows programs and receive regular updates to fix bugs. Other IDEs I have used don't even regard some problems as bugs at all!
    • Keil donations are accessible via ARM which doesn't have the sale of development software as its core business with the implications for donations.
    This IDE will support both courses quite happily as well as scaling to include larger student projects. This will reduce the amount of repeated documentation needed across the two courses as we will be supporting one IDE.

    Development kits

    The two courses have quite different requirements. One course needs two boards:
    • One for the students to own by themselves - cost is a very important consideration
    • One with a large amount of IO for the labs and direct access to a memory mapped IO space
    • Both should not have too many built in demonstration hardware (it pushes up the cost and wastes valuable IO capability)
    The other course needs a board with native USB connectivity and as much raw access to IO as possible. Both courses need all the boards to be compatible with the Keil uVision IDE/compiler/debugger suite.
    Resources to help find suitable ARM development platforms can be found on the university program section of ARM's website. Here are the highlights of the development boards I have considered:

    Low cost student owned development board

    ST STM32VL-Discovery
    The winner: ST STM32VL-Discovery
    • Micro is the ST STM32F100RB, a Cortex-M3 running at 24MHz with 128kb of flash and 8kb of RAM
    • On chip peripherals: 1x ADC, 2x DAC, Timers, 2x I2C, 3x USART, 2x SPI and something called CEC. There is also a DMA unit
    • The programmer/debugger is an ST-Link built onto the top section of the board. It can also be used to program and debug other ST STM32 ARM based microcontrollers using the ARM Serial Wire Debug (SWD) bus
    • The ST-Link is Keil uVision compatible for programming and on chip debugging
    • $12 as of the 11th of July from DigiKey
    NXP LPCXpresso
    Honourable mention: NXP LPCXpresso
    • LPCXpresso is both an IDE (powered by code_red technology) and a set of development boards for NXP's LPC ARM based microcontrollers
    • Supports various LPC families of microcontrollers based on the ARM Cortex-M3 and simpler Cortex-M0
    • Built onto a (one time detachable) NXP LPC-Link programmer/debugger that is supported by the code_red based IDE
    • $29.95 as of the 11th of July from DigiKey. Note the variety of parts available as well as more complex and higher cost LPCXpresso compatible boards
    ARM NXP mbedFor consideration: ARM NXP mbed
    • Needs no special programming hardware or any locally installed software as it is programmed from the mbed website using a web browser. Appears to the PC as a USB stick where if you place a file on it and press a button it programs itself
    • Seriously limited for my purposes by the lack of On Chip Debugging and simplified programming system [NB not due to the NXP micro but due to the requirement to not need locally installed software]
    • This was not a serious candidate for the level of education that I wish to engage in however is a very serious player for courses in high school or the first year of University
    • Does not come with a "standard" programmer/debugger
    • $60 as of the 11th of July from DigiKey

    Lab board with native USB connectivity and external memory interface
    [NB This board is a Keil product due to my relationship with ARM, not that there are not excellent candidates from other providers.]
    [NNB Keil does make excellent boards though!]

    Keil MCB9B500 Evaluation Board
    The winner: Keil MCB9B500 Evaluation Board
    • Micro is the Fujitsu FM3 MB9BF506, a Cortex-M3 running at 80MHz with 512kb of flash and 64kb of RAM
    • Full physical access to each and every pin on the device
    • On chip peripherals:  USB2.0 Device and Host, 2x CAN, 8 channel DMA, External Bus IF supporting 8/16 bit SRAM, NOR and NAND flash with up to 8 chip selects, 8x USART/CSIO/LIN/I2C Serial ports, 8x Timers, 2x Multi-Function Timers, CRC Accelerator and 3x 16 channel ADCs
    • Apart from a few LEDs, switches, one potentiometer and the USB device and host port all the rest of the IO is accessible via the two fantastic 0.1" pitch dual row sockets. 0.1" pitch headers are the best educational header being sufficiently small to be convenient but strong to take abuse and rough handling
    • Programmed and debugged by the ULINK-ME programmer from Keil (not shown). [NB This is not available for general purchase, only with new kits so be sure to make sure it is included]
    • Compatible with uVision IDE from Keil [NB Obvious, perhaps]
    • $100 as of the 11th of July from DigiKey (I haven't linked it here as DigiKey doesn't seem to sell the version with the bundled ULINK-ME but I can't be sure. Buyer Beware)
      [All pictures are copyright their respective owners and are reproduced here for convenience. For owner information consult the images ALT text. ARM, Cortex, Keil, uVision, mbed, LPC, ST-Link, LPC-Link, Fujitsu FM3 and ST Discovery are probably all trademarks of their respective companies.]

        Wednesday, 16 March 2011

        I have moved to Purdue in the USA!

        Following on from a madly busy time at Southampton, UK, I am now working for Purdue University in the USA, which should be a lot of fun!

        Tuesday, 29 June 2010

        The future of computing? The SpiNNaker million processor computer

        The future of computing is a very big claim. However if anything the future of computing does not lie in a common time, unified shared memory system - this pretty well describes every multicore computer made at the moment. The trouble is once you go to more than a few thousand cores the shared memory - shared time concept falls to bits. It just doesn't scale. Where then should we look for a new model other than biology? The brains of living creatures are capable of phenomenal processing power and yet have almost none of the features of the computers we build today. Don't you find this odd?

        Lets look at how brains do it:

        • Does each neuron have a sense of the passage of time? Not in the traditional sense and maybe not at all. If it does it will probably only be only vague ordering.

        • Does a neuron share a memory space with every other neuron? No. The closest concept is an area effect of diffusion of chemicals which affect neighbours (which may not be synaptically connected) but that is very far from a shared memory concept.

        • Is each neuron connected to every other neuron? They have many connections to close neurons and some connections to neurons further away best described by a statistical distribution but otherwise, again, no.

        So clearly modern computing bears as relation to a brain as sudoku does to quantum mechanics. The closest we get are the datacenters of a company such as Google but they enjoy far too much connectivity to be a good model.

        Enter the world of academia because no commercial company would be stupid enough to commit the resources we can into a project like this (and all the ones before it which make it possible) - it is unlike anything that has come before and it is risky (i.e. it may not work!).

        Project SpiNNaker

        This revolutionary idea is nothing less than a plan to put together a computer of approximately 1 million cores with no common clock or shared memory and which can route messages with a model that approximates a neurological system.

        Born in the University of Manchester's Advanced Processor Technologies Group the SpiNNaker project has taken shape in collaboration with the University of Southampton's School of Electronics and Computer Science (where I work), the Engineering and Physical Science Research Council and two enlightened commercial companies: ARM (for the processor IP) and Silístix (for their Network on Chip expertise).

        SpiNNaker is a child of many parents, each a vital step on the path to it's genesis:
        1. AsipIDE GALS Design and Co-Simulation Framework - A hardware/software co-design and debugging framwork
        2. Transactional Memory - A new locking strategy which massively simplifies the ability to lock data to pass information between processes
        3. TERAFLUX: Exploiting Dataflow Parallelism in Teradevice Computing - A pan-european project looking at computers with massive numbers of cores
        4. The Balsa Asynchronous Synthesis System - A language and compiler targetting asynchronous, handshake driven logic design
        The main centrepiece of the SpiNNaker project is a special System on Chip, the SpiNNaker SoC. Inside are 18 processors asynchronously connected via a blindingly fast network-on-chip and communicating with the other SoCs via hundreds of megabit links to the other SpiNNaker SoCs. With these SoCs it will take only 56,000 chips to reach the target of 1 million. The cores are ARM968 series processors capable of significant independent computation while communicating with their neighbours or the 1GBbit DDR SDRAM available to each SoC. Even within a SoC the cores do not share a common clock and communicate by passing messages which are routed to each other or outside to another SpiNNaker SoC.

        A seriously ambitious project you say? Absolutely right. Vapourware or a dream? Hell no! Feast your eyes on this:

        This is the first generation of SpiNNaker SoCs on a test board - there are four dual processor SoCs on this board each next to its accompanying 1Gbit ram chip (click on the picture for a very high resolution version). This board already holds 8 processors asynchronously interconnected with all the necessary debug hardware to perfect the design of the next generation.

        Who ever said British Science was dull?

        BIG DISCLAIMER: I am not one of the great minds trying to change the world with this project. At best I have helped a couple of people around the edges. I am a very big fan, however!

        Thursday, 22 April 2010

        A fixed OpenSPARC T2 build for Design Compiler 2009....

        Just a very quick note to say that we have fixed the compile problems for the OpenSPARC T1 processor when building it with modern versions of Design Compiler (any > 2007).

        More information, including sizes and speeds on the Synopsys 90nm EDK to follow. Bear in mind it is missing the PLL and a couple of other small modules.

        We will, of course, push this upstream.

        Wednesday, 31 March 2010

        Current activity with the EVE ZeBu Hardware/Software Co-Verification Environment

        The title is a fancy way of referring to the EVE ZeBu accelerators (much more information on their website). I previously posted on our acquisition of a UF-2 (I should mention we liked it so much we now have 2!) so I am taking a moment to show what we are up to with this wonderful technology. I have broken it into research and teaching topics.

        Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL
        The investigation of neuron structures is an incredibly difficult and complex task that yields relatively low rewards in terms of information from biological forms (either animals or tissue). The structures and connectivity of even the simplest invertebrates are almost impossible to establish with standard laboratory techniques, and even when this is possible it is generally time consuming, complex and expensive. Recent work has shown how a simplified behavioural approach to modelling neurons can allow “virtual” experiments to be carried out that map the behaviour of a simulated structure onto a hypothetical biological one, with correlation of behaviour rather than underlying connectivity. The problems with such approaches are numerous. The first is the difficulty of simulating realistic aggregates efficiently, the second is making sense of the results and finally, it would be helpful to have an implementation that could be synthesised to hardware for acceleration. In this paper we present a VHDL implementation of Neuron models that allow large aggregates to be simulated. The models are demonstrated using a synthesizable system level VHDL model of the C. Elegans locomotory system.
        The role of the EVE in this specific project is verifying and executing the largest of the neural net models functionality using your cosimulation replacing previous, limited, technology based around a FPGA board using a probe program.
        Bailey, J., Wilson, P., Brown, A. and Chad, J. (2008) Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL. In: BMAS. (In Press)
        Bailey, J., Wilson, P. R., Brown, A. D. and Chad, J. (2007) Behavioural Simulation of Biological Neuron Systems using VHDL and VHDL-AMS. In: IEEE Behavioural Modeling and Simulation, Sep 2007, San Jose, USA. pp. 153-158.

        Architectures for Numerical Computation

        Since the 1960s, the observation that has become known as Moore’s Law has become a self-fulfilling prophecy. Processing power doubles every two years because of the advances in CMOS technology. There are clear signs, however, that these technological advances are coming to an end. The eco- nomics of pushing CMOS technology to its physical limits will eventually halt further development.
        If it is no longer feasible to increase computing power through smaller, faster transistors, the al- ternative is massive parallelism. This progression is already apparent. Multi-core and multi-threaded processors are now common. Although modern operating systems are able to use multiple cores, with few exceptions, programs are confined to single cores. The challenge facing software engineers is to make best use of multiple cores.
        A significant amount of processing power is concerned with numerical computation. Consumer applications, such as image and audio processing are fundamentally numerical. Similarly, engineering applications, such as simulation and optimization rely on numerical calculations. At this point, we should distinguish between consumer and desktop applications and High-Performance Computing (HPC) tasks that rely on clusters of dedicated processors. It is not our intention to move into the HPC world at this time.
        While using multiple cores can accelerate many numerical algorithms, far greater speed-up would be possible using more specialized forms of hardware, such as GPUs and FPGAs. A further consideration is power consumption (and the related problem of heat dissipation). Custom hardware can reduce power consumption by an order of magnitude or more. The key, of course, is to use the resources in the best possible way. In the context of the work proposed here, there are two aspects to this problem. First, we need to make the best division between hardware and software and second, we need to design an appropriate overall architecture.
        The obvious role of the EVE platform in this research field is to support the research into specific computing pipelines, fine grain computation blocks and architectures as well as enabling the development of some 16 lane PCIe computation accelerators.

        Hosting a Complex SoC on the EVE Platform
        This project was a proof of concept and de-risking of the EVE transactor flow using a large SoC. The SoC in question was chosen to be the Gaisler-Aeroflex LEON3 ( The LEON3 SoC is based around a SPARC v8 compatible CPU and is written in VHDL. The minimal LEON3 SoC was built using the EVE support for memory to model the processor cache along with the transactors for DRAM and UARTs. This work will be extended to include the VGA, Ethernet and USB transactors on the hardware side it and include support for the Snapgear Linux version for the LEON3.

        Verification of a highly integrated ASIC
        A large masters level project framework will produce a heterogeneous multicore ASIC to perform processing on HD video data streams. It will bundle an 32 bit microcontroller core, on chip SRAM, our custom geometric processor along with a multilayer AMBA bus architecture optimised for power and contention. The EVE will be an invaluable support to the simulation and verification of the final design before it is sent for manufacture.

        So there you have it - quite a lot going on, all of which is really fascinating and fun!