Matthew's Mumblings

Thursday, June 12, 2014

Modding a Gigabyte i-RAM (GC-RAMDISK) to run without being plugged into a PCI slot...

Gigabyte i-RAM or GC-RAMDISK, PCI version

The Gigabyte i-RAM (or GC-RAMDISK) is a curious device. I have always been uncertain if it was a commercial success or not however the main specifications are here:

PCI 3.3V and 5V slot for power (bottom edge connector)
SATA 150 hard disk connection (red connector, top right)
Maximum capacity of 4 gigabytes (yes 4)
One benchmark has it at 25,576 RW IOPS for 512k (YES 25k IOPS)
Doesn't wear out or degrade with time or capacity like an SSD
Backup battery (on the right) keeps the data on the drive live for about 16 hours.

The only point of real interest is the IOPS. You can get the whole thing setup from eBay for <$200 while to achieve better IOPS from a SSD requires something like the OCZ Vector at $450. Note that for all normal uses the OCZ Vector is a much better solution!

ZFS ZIL

I am a big user of ZFS on Linux for many reasons (note unless they have released 0.6.3 use git HEAD as so many important fixes have been bundled in since 0.6.2).
ZFS is an excellent file system (you can read about it here) but mainly I value it for the data integrity first and the ease of configuration second.
ZFS is designed to take advantage of hierarchical storage to deliver increased performance, i.e. you can feed it hard disks for bulk storage, SSDs and things like this RAMDISK as caches/log volumes and it will present a single, high integrity, higher performance, volume to the operating system.
The log I care about particularly (as do most ZFS users) is the ZIL or ZFS Intent Log. For full details read the blog but basically every so often a filesystem must commit data to persistent storage before carrying on for data integrity/safety's sake. Having a low latency, separate log allows the filesystem to massively accelerate its performance by moving those critical writes off the slow bulk storage onto a dedicated device until they can be committed to bulk storage in an scheduled, ordered, sane manner. This also frees the bulk storage to service the other outstanding requests for data.
SSDs work well as ZIL devices however the device is constantly hit with small writes which will wear the disk reasonably quickly and degrades the IOPS as the different areas of the flash array are written to and the management IC works harder.
The Gigabyte i-RAM (GC-RAMDISK) doesn't wear, nor does it suffer from degraded IOPS over time. It also still has a very high IOPS value for such an old piece of hardware.

Modifying it to run outside a PCI socket

While the i-RAM looks like it is a PCI card it merely uses the socket for power, nothing else. A bit of reverse engineering reveals the i-RAM uses the following:

12V seems to be related to the battery backup subsystem
5V used to power the DDR termination supplies and charge the battery, certainly high current
3.3V used to power the main electronics on the board
3.3Vaux [always on] used to trickle charge the battery and keep the RAM contents alive when the computer is powered down
RST# used to switch the card from standby to on when the PC powers up and off again on power down

So if we look inside the PC (not using the motherboard or a PCI slot) then the ATX connector has nearly all the signals we need:

12V
5V
3.3V
5Vsb [always on]
PS_OK indicates the power supply if fully turned on and voltages stable

So the connections are (i-RAM - ATX):

12V - 12V
5V - 5V
3.3V - 3.3V
3.3Vaux - Power Regulator - 5Vsb
RST# - PS_OK

The only supply missing is the 3.3Vaux however there is the 5Vsb available. With a suitable linear LDO and heatsink it was possible to build a regulator. The i-RAM is quite naughty and draws far too much current from the 3.3Vaux in the PCI bus - I have measured it at over 1.3A! To cope with this I chose the Texas Instruments LM1084 LDO linear 3A regulator.

Here are the photos of the final product:

Rebuilt i-RAM attached to standoffs inside the case lid

Component side (B) of the modified i-RAM showing heavy flywires and Molex sockets

Side A of the modified i-RAM showing heavy flywire connections


Inline ATX 5VSB to 3.3Vaux linear regulator (in fact the regulator on this board is incorrect and was replaced with the LM1084)

Screw terminal block splicing the necessary ATX voltages for the i-RAM

Wednesday, May 28, 2014

Making ModelSim ALTERA STARTER EDITION vsim 10.1d work on Ubuntu 14.04

[WARNING: Some people are reporting that following the steps for them does not fix the problem. I am working on trying to find out what the issue is.]

Trying to get a version of ModelSim running on a very modern version of Linux often presents challenges. Luckily I had lots of helpful information on the internet (major sources linked below) to get it going. This article mostly adapts the work done by the Arch Linux crew.

Problem number one: The free version of ModelSim Altera Edition is 32 bit only while the normal Linux PC will be 64 bit.

On Linux this requires us to install the 32 bit versions of the libraries that it depends on. Luckily this is fully supported on a modern Linux like Ubuntu 14.

sudo dpkg --add-architecture i386
sudo apt-get update

sudo apt-get install build-essential

sudo apt-get install gcc-multilib g++-multilib \
lib32z1 lib32stdc++6 lib32gcc1 \
expat:i386 fontconfig:i386 libfreetype6:i386 libexpat1:i386 libc6:i386 libgtk-3-0:i386 \
libcanberra0:i386 libpng12-0:i386 libice6:i386 libsm6:i386 libncurses5:i386 zlib1g:i386 \
libx11-6:i386 libxau6:i386 libxdmcp6:i386 libxext6:i386 libxft2:i386 libxrender1:i386 \
libxt6:i386 libxtst6:i386

Problem number two: If you have the following error when running vsim:

** Fatal: Read failure in vlm process (0,0)
Segmentation fault (core dumped)

Then you probably need to build a new version of freetype, a font setting library and modify ModelSim to use it. For an unknown reason ModelSim has an issue with modern versions shipping in Arch and Ubuntu 14.04. First download the source code of freetype 2.4.12:

http://download.savannah.gnu.org/releases/freetype/freetype-2.4.12.tar.bz2

Now install the build dependencies needed for libfreetype6, extract the source (using tar) and configure and build libfreetype:

sudo apt-get build-dep -a i386 libfreetype6

tar -xjvf freetype-2.4.12.tar.bz2
cd freetype-2.4.12
./configure --build=i686-pc-linux-gnu "CFLAGS=-m32" "CXXFLAGS=-m32" "LDFLAGS=-m32"
make -j8

The finished libraries are now available inside the "objs/.libs" directory. As they are necessary to run ModelSim we need to copy them into the install directory so they don't get lost and then modify ModelSim's vsim script to use the new libraries instead of the system wide versions. Change directory to the directory where you installed ModelSim, /opt/altera/13.1/modelsim_ase/, on my system. Note you may need to edit the directory paths to match those used on your system.

sudo mkdir lib32
sudo cp ~/Downloads/freetype-2.4.12/objs/.libs/libfreetype.so* ./lib32

Now we need to edit the vsim launch script to ensure the new freetype libraries are used:

sudo vim bin/vsim

Search for the following line:

dir=`dirname $arg0`

and underneath add the following new line:

export LD_LIBRARY_PATH=${dir}/lib32

Test by running vsim and hopefully you will be greeted by the ModelSim GUI.

[Tested on fresh install of Ubuntu 14.04]

Sources:

Saturday, May 10, 2014

Making my Lenovo MCE RC-6 IR Receiver immune to interference by my TV backlight

Sadly due the newer technologies employed in CFL bulbs (which incidentally is the technology behind my TV) emit a lot of IR interference on the frequency ranges used by the older IR controls. Vishay publishes a summary which goes into more technical detail here: http://www.vishay.com/docs/80072/disturan.pdf.

My MCE receiver is an RC-6 system by SMK RXX6000-40 branded Lenovo, if you want to buy one of your own then please checkout eBay.

HP Remote with USB Receiver RC-6 [Taken from eBay auction linked here]

However despite being aesthetically excellent the receiver suffers from interference from the TV backlight which makes it unusable if the room light isn't bright, wholly unsuitable for certain movies.

This is a pretty common problem for the older IR systems however newer IR receiver modules have been enhanced with better filtering systems.

Opening the receiver is easy, a single screw hidden in the center of the label and the top and bottom come apart to reveal a single PCB.

The IR module is the black module standing up on the lefthand edge of the PCB (the "front" of the PCB is the south west side). Careful scrutiny of the top of the module reveals it is a Vishay Siliconix TSOP34838, a 38kHz filtered IR module.
Consulting the datasheet from Vishay, http://www.vishay.com/docs/82489/tsop322.pdf, we can see that there is another, pin compatible module, with an enhanced filtering system (Vishay's Active Gain Control 4 or AGC4) to exclude the interference the TSOP34438 is vulnerable to.

I have the new module on order and when it arrives I'll post the results.

My current home media center build: Mythmaster

Apologies - working for two long under various agreements that mean I can't discuss my work means I haven't posted anything in too long.

Hardware

Bought from eBay from Tamsolutions (there may be more available as they seem to have shipments regularly as of May 10th 2014). The computer is quite noisy so it is based down in the basement and connected to the TV by an HDMI extender and a USB extension cable.

Case: AIC RSC-4ED2 with 6x SATA-II 3.0 Gbps four position backplane (TW-000-51675-AR).

Motherboard: SuperMicro H8DME-2 nVidia MCP 55 Pro.
- Modified one of the PCIe 8x sockets to accept a x16 GFX card by opening the end of the socket.

GFX Card: GeForce GT 240. [Chosen as it can do the most complete deinterlace on 1080i video]
- Modified with a flywire to connect the PRSNT#1 pin (1A) to PRSNT#2 (48B) so the motherboard detects the card in the modified PCIe 8x socket.

CPU: Two Quad Core Opteron 2389s. [An eBay purchase to replace the single Dual Core 2212 HE. Don't forget the extra CPU cooler]

Memory: 24 Gigabytes of PC-5300P. [Increased by another eBay purchase]

3x SATA Controller Cards: AOC-SAT2-MV8

Drives:
2x HP ST3250824AS 250GB SATA HDDs in Linux Software RAID 1. Note please don't waste time trying to enable SATA-II or NCQ on the HP versions of this disk. They are SATA-I and don't advertise NCQ.

6x 2TB Seagate or Western Digital HDDs managed by ZFS.

TV: 47" VIZIO E470VA

HDMI Extender: No real name HDMI over dual CAT5e or CAT6 cables from Amazon.

USB Long Cable: An active USB 2.0 cable from Amazon.

ATSC Receiver: HDHR4-2US Silicondust HDHomeRun Dual Tuner.

Remote control: Lenovo RC-6 receiver by SMK RXX6000-40.

Software

OS: Linux Ubuntu 14.04 LTS.
- nVidia drivers from Ubuntu 319.
- Pulseaudio removed to enable sound in XBMC standalone
- User autologin running XBMC standalone.
- Modified the udev rules to enable the IR remote (MCE USB by SMK) device ids, read this thread for details.

Disk Management: ZFS on Linux, GIT head (as the current 0.6.2 release is old and the new version is about to be released. To work on Kernel 3.14 the latest GIT version is necessary).

Frontend: XBMC 13.0 Gotham.

Backend: Mythtv 0.27 managing the Silicondust HD HomeRun and scheduling recordings.

Monday, August 8, 2011

Mapping the Absolute Addresses of Registers from a C Header

Why am I doing this, you may ask? Doesn't the microcontroller manufacturer provide the absolute addresses of the registers? Well in this case, no they don't, and I need them to complete the plugin for the Keil µVision debugger which will make them visible and editable via human readable names.

There are a number of ways of mapping hardware into C but due to the limitations of the const keyword [did you know you can cast the const away? add it to the list of things in C like = actually being assign, declare a pointer with * and then deference it's content with *, etc., etc.] most microcontroller manufacturers end up using #define statements as they can be made to result in fixed numeric constants that can then be optimized correctly in all situations.

[For a quick counter example create a function that accesses a global const pointer, call it and then look at the ASM. Does that look optimal to you? But don't blame the compiler - it can't assume you haven't messed with the const pointer with casts! A good discussion is here.]

Anyway the normal way is to begin with several layers of #defines. The reason is for maintainability and portability. Typically the registers for a microcontroller peripheral are created at a fixed offset from a base and typically they will want to use the same peripheral in several micros. So this is how they arrange things for the USB module in the Fujitsu MB9BF506R, an ARM Cortex M3 based microcontroller:

#define FM3_PERIPH_BASE    (0x40000000UL)

...

#define FM3_USB0_BASE      (FM3_PERIPH_BASE + 0x42100UL)

So the USB peripheral is actually to be found starting at address 0x40042100UL. However the actual device registers are laid out by a C struct. The struct creates the offsets for the individual registers internally and maps them to the human readable names.

Structs have several rules imposed on them by the C standard and also have a few gotchas so please treat these header files with great care when used with a different compiler UNTIL you have verified it. The rules are:

The named elements of a struct will be in the same order as declared in C .
Suitable padding may be included between elements to speed access to the data.

For more information refer to this article: http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm.

[If you don't want suitable padding to be added then consider using the pack attribute, e.g. in gcc #pragma pack. This can have terrible performance consequences so as always, know what you are trying to achieve.]

Here is a fragment of C from the mb9bf506r.h header file which creates the C symbols for the appropriate registers:

typedef struct
{
  union {
    union {
      __IO uint16_t HCNT;
      stc_usb_hcnt_field_t HCNT_f;
    };
    struct {
      union {
        __IO  uint8_t HCNT0;
        stc_usb_hcnt0_field_t HCNT0_f;
      };
      union {
        __IO  uint8_t HCNT1;
        stc_usb_hcnt1_field_t HCNT1_f;
      };
    };
  };
...

So following the rules of struct's in C we have the first element being an unsigned, 16 bit long register called HCNT at the first address of this struct, 0x40042100UL. Note the use of unions to allow the same address in memory to be accessed several different ways. In this case it is enabling HCNT to be accessed as a 16 bit wide register and as two separate 8 bit ones called HCNT0 and HCNT1.

Lets look at the same piece but with the next register's definition included:

...
  union {
    union {
      __IO uint16_t HCNT;
      stc_usb_hcnt_field_t HCNT_f;
    };
    struct {
      union {
        __IO  uint8_t HCNT0;
        stc_usb_hcnt0_field_t HCNT0_f;
      };
      union {
        __IO  uint8_t HCNT1;
        stc_usb_hcnt1_field_t HCNT1_f;
      };
    };
  };
        uint8_t RESERVED0[2];
  union {
    __IO  uint8_t HIRQ;
    stc_usb_hirq_field_t HIRQ_f;
  };
  union {
    __IO  uint8_t HERR;
    stc_usb_herr_field_t HERR_f;
  };
        uint8_t RESERVED1[2];
...

The line of real interest is uint8_t RESERVED0[2]; between the two 16 bit registers. ARM devices' memory map is byte addressing, i.e. each byte attracts a unique address number. So the 16 bit variable is sitting across two 8 bit numbers (which the union allows individual access to). However the main memory of an ARM system is typically organized on 32 bit words, i.e. two 16 bit words and four 8 bit words. The uint8_t RESERVED0[2]; array of two 8 bit words 'pushes' the next register (the 8 bit HIRQ) to the start of the next 32 bit word. This means that the HCNT is at 0x40042100UL while the HIRQ is at 0x40042104UL, 4 bytes offset from the USB peripheral base.

Saturday, July 23, 2011

The External Memory Interface as an Educational Tool

In a recent email conversation with an engineer I ended up discussing the advantages of an external memory interface (EMI) over general purpose I/O (GPIO) on a microcontroller for the purposes of education. This short piece comes from that discussion.

The question posed was this:

"What advantages does an external memory interface have over driving a peripheral with GPIO?"

... which immediately brings out the question, what do I mean by an external memory interface and what do I mean by GPIO?

General Purpose Input Output (GPIO)
This typically refers to a pin of the chip which is not necessary for any of the operations of the chip but is available to a system designer using the chip. Put another way if none of the GPIO are used then the chip will function just fine. The pins are under software control and can be turned into either inputs allowing data to be read from the outside world or outputs pushing data to the outside world.

External Memory Interface
In modern digital computer architecture the standard method of connecting a processor core to the outside world is with a memory interface. This is typically formed from an address bus, a data bus (either bi-directional or two uni-directional ones) and some control lines to indicate whether data is being written or read.

Normally this is present inside the chip but forever hidden from the designer inside and only a few designs allow it to be exposed. Those that do allow extra RAM or ROM memory to be added to the fixed amount of a normal microcontroller design.

Engineering Advantages of EMI

Direct memory mapped busses has a couple of advantages over GPIO so, as usual, it depends what you are doing as to whether you care (typical engineers comment!). Here are the two main differences:

Native machine code access. Grants speed, reduced code size and I would guess lower power? No extra instructions to drive the GPIO and turn the bus around to slow you down. Simply create pointers and data structures as usual and prod the linker to place them in the external chips. Or use explicit addresses.
Code execution from external memory. Related to the above.

1. is only an issue when speed is at stake, really. Code size isn't that big a problem most of the time and the software wrapper around GPIO won't be very large. Certain ADC's and DAC's (usually video or software radio) pretend to be SRAM for extremely quick access and would really require this method of access to function effectively.

2. only matters if you have run out of on-chip non-volatile memory and have to move sections of your program off chip.

[Just speculating re. 2. code execution: you could change the compiler to emit instructions to drive a GPIO for data memory accesses in a certain range. However I can't work out how to do a native instruction fetch from the external memory without a little code in internal flash copying it to an area of RAM first. Perhaps a block copy strategy held in flash as a bootloader type thing could minimize the pain of this if it was invoked as appropriate by a compiler or linker? Note that in the world of computers I have discovered that someone has always built an implementation of just about every whacky idea. If you know of a system that uses this paradigm then please let me know. PC's don't count.]

In a final summary if you are accessing a data storage area or a peripheral occasionally and don't care about 1. or 2. then a simple software wrapper around GPIO would do just fine and be relatively efficient. I would guess 3 to 5 times slower?

Educational Advantages of EMI

From an educators point of view when teaching computer architecture and interfacing and EMI is gold in terms of hammering what is happening inside all general purpose computers into the heads of students. I mean simply this basic idea:

"A CPU issues an address onto the bus with control signals and writes or reads data."

This very basic idea is surprisingly poorly understood by many students as it is completely hidden from them inside micros or PCs. In many EE or even CompE courses the students never actually physically observe it in action or build their own peripheral and code software to drive it. As long as it remains a mental exercise a substantial portion of the students will not really understand it and without understanding it their ability to understand computers and software suffers.

Myself and colleagues (past and present) have gone to great lengths to try to give them some practical experience at it - I first encountered a softcore PIC in an FPGA used to expose its pipeline and memory and databusses for the students to observe and extend. We moved to an Atmel part (an ATmega128) with a real external memory interface instead when it became available. Multiplexed address + data but designed so it could drive an 74 series 8 bit latch to generate the full bus. Much improved from our softcore PIC! We would set them the task of mapping an 8k SRAM and an 8 bit input and 8 bit latched output into the memory map of the microcontroller.

Picture of a FM3 MB9BF506R Microcontroller. Property of Fujitsu Global.

I am looking forward to developing similar exercises with a more advanced external memory interface (called the EBI, external bus interface) found on Fujitsu FM3 ARM Cortex M3 based microcontrollers. It has the following neat features, and is really designed to drive FLASH memories transparently which is no mean feat:

It does have an external wait input which could prove very useful
for custom peripherals.
It has a basic SRAM mode with configurable timing for read and write strobes.
It can transparently split and rebuild data from 32 bit to bus widths 8 bits (or 16)
on the fly by issuing the correct sequential addresses.
It can natively drive various types of FLASH memory including NAND block erase and NOR page reads etc. while pretending to be normal memory to the software.

Monday, July 11, 2011

Supporting a Microcontroller Course with Hardware

If you have come to this article for a prescription then I am afraid you are in the wrong place. I have structured this piece as a list of questions you need to answer to obtain the microcontroller that really suits your wants and needs. I have included the answers that work for me at the end to show how I have answered my needs.

Teaching microcontrollers always starts with the architecture. Sadly this is hardly ever a process that starts with a clean sheet. Each university or company typically has historical investments in a few preferred architectures and sticks to them like glue. Opportunities to change existing courses should always be carefully weighed for the costs and benefits.

If you are starting from a clean sheet, or reconsidering existing teaching let me urge you to consider the ARM architecture for one incredibly good reason: the prediction that one ARM processor will be manufactured per person per year by 2014 (EE Times). There are already billions out there in the world. If you are not teaching ARM consider very carefully whether you are doing the right thing by your students by leaving this very valuable training out of their studies. If you want other reasons consider you can get ARM powered chips for the same prices as 8 and 16 bit microcontrollers with comparable peripherals that have 32 bit datapaths and high clock speeds from many manufacturers! Access to development kits mounting ARM Cortex-M devices with programmers/debuggers can cost as little as $12.

[It should be no secret that I am an admirer of ARM cores and have been working with ARM for some time now]

What are the course choices?

Excluding budget constraints and existing equipment and if the number of hours in the course is fixed then we can classify courses into three types. These types are based on where the weight of the learning outcomes are placed as well as the existence of suitable hardware and software support. The three categories are:

"Bare metal" (bare chips and a programmer/debugger) - This is one of the best ways of achieving learning outcomes that include basic hardware requirements of microcontrollers i.e. clocks, resets, capacitive decoupling, etc. Coding will consider the booting of the microcontroller as well as code to support any peripherals.
Interfacing (a PCB with a few LEDs and switches, probably a clock crystal) - Learning outcomes mainly consist of building interfaces to other peripherals and hardware and advanced coding to support the peripherals.
Embedded Software (a loaded PCB where every interface or IO connected to an appropriate demonstration peripheral) - This emphasizes coding, probably including a suitable RTOS or algorithms development for embedded systems.

No matter what level of course you decide to support there is one basic point that I think needs to be addressed right now:

On Chip Debugging. Can the actual ASM code and data in RAM be observed while the microprocessor is executing software?

It can be very easy to end up with hardware that will not support any visibility into the microprocessor/microcontroller as its executes software. In my opinion this damages and interferes with your student's ability to understand and experiment with code inside the microcontroller. In my opinion on chip debugging is a basic pedagogic requirement for any course, not a "value added extra" [NB It is possible to design courses around this limitation but, really, in the 21^st century why should you have to? And why should your students be limited in this way?]

Practical considerations when choosing a course

The main point when considering the three main levels of microcontroller courses you are thinking of teaching is whether you have any electronic laboratory capability, i.e. lab equipment (PC, oscilloscope, function generator, power supply and parts) and trained teaching and support staff or not.
If you have a lab then you can look at implementing any of the levels. If not then go for Embedded Software straight away or establish the lab later. "Bare Metal" and Interfacing are not for you!
As a practical point for "Bare Metal" courses if chips are not available or convertible to a DIP package choose another architecture to teach.
If you have the lab then the investment in development kits may be a factor. The fully featured kits necessary for Embedded Software are the most expensive, and the least effective at supporting the Interfacing course. At best a few GPIO are available for your students to play with.
Homebrew hardware/Custom hardware or commercial? This is always a tricky one and best left to your judgment. A few points to consider are:

Have you used the micro before? If not a commercial kit may help, at least for the first few years.
Do you really need a custom system? If you are doing something specialist or have an existing investment in expansion hardware then a custom system can be a part of a really interesting and challenging course.
Custom hardware needs designing and then supporting i.e. repairing, updating, etc. and often proves much more expensive over time than commercial kit.

Which architecture should be taught?

Bear in mind the three "levels" of course I have just discussed then we can seriously look at which architecture we should support from two points of view: Pedagogic (teaching) and Practical.

Pedagogic

Are the functions of the CPU core easy to separate into a simple subset? All real cores tend to have advanced features and blend certain activities due to the need for speed but for training purposes can you keep them separate?
Memory access. This is pretty important and in my opinion it should be a single memory space not paged and the ability to directly execute on data in the memory should be limited - i.e. register math
Is the architecture RISC or CISC? Teaching any CISC architecture is typically only sensible from a software programmers point of view and even then it doesn't lend itself to a structured course. I would firmly suggest that a RISC form the basis of your courses

Basically I am a RISC fan from a teaching point of view. As I have to teach fundamentals of hardware structure and design as well as software starting from a RISC is a huge help to my students. [As a cheeky point most CISC designs these days suggest a limited subset of functionality be used for speed so teaching that subset is a great idea.]

Practical

Confidence/Experience - If you have had success with a design or device then a new device represents a big unknown in terms of software and hardware. Remember those unexpected Errata?
Documentation - Typically for a university a huge amount of existing documentation and notes that have built up around the architecture. Think of all the tutorials and lectures that need rewriting!
Staff training - Your lab helpers have to know how things work, i.e. does the debugger connect every time? Do you have to reboot the PC if the thing hangs? Does the software have any quirks? (perhaps what quirks does the software have)

These reasons are also very true for companies but I would suggest that the documentation point would be replaced with code libraries when a commercial client is involved.

What hardware?

Here are the factors that I think are most important in choosing a development kit. This is where ARM technologies really score - compared to proprietary architectures there is a real diversity of choice of chip manufacturer with about every possible peripheral included:

Cost - labs full of development kit adds up pretty pricey
Programmer/debugger cost - see above. Don't forget that many programmer/debuggers are NOT bundled with the development boards. Also if you want your own students to have their own personal boards they are going to need programmer/debuggers.
Software - how easy is it to access the compilers and debuggers
Robustness - students are pretty hard on kit. How much work are you going to have to do to harden the boards electronically and physically?
On chip peripherals - if you are going to teach standard peripherals like USARTs or I²C then make sure they are included in your micro

What software?

We then need to consider the programming environment. Again ARM has a real breadth of choice of IDEs and compilers for ARM based microcontrollers. The following points should be born in mind:

Is there a free version for students, and if so how limited is it?
Is the full version terribly expensive and under what conditions can it be accessed? Hardware companies have an easier choice when software than purely software companies as for them the software is not their core business.
How easy is it to support? There are plenty of development IDEs that require Administrator access to run or access the debugger and are often very unstable or full of bugs*.

[*IDEs with bugs can be a problem for a professional engineer but for students this can be a showstopper. It is often hard enough for them to grasp the correct operation of an IDE/micro/debugger system let alone diagnose a faulty one. Also imagine a whole lab full of machines with full admin access in the hands of students? Note I am painfully experienced in getting limited admin access for certain programs, but really, why would I want to?]

My choices

[These choices are influenced by my relationship with ARM but they may help you work through your own choices. It is important to declare interests.]

The two courses I am planning to support are two Interfacing courses. I also have an eye on project work which involves microcontrollers. Both courses are established and apply different requirements on the hardware that is going to be used.

What architecture?

Looking at ARM architectures for what I wish to achieve it is clear that we should be looking at the current generation of cores, i.e. the Cortex M or A series. The M stands for microcontroller or mixed signal and A for application. From an architectural point of view the simpler M series are clearly preferable. We are looking at Interfacing and the available M series parts are much more suited to that goal in terms of their peripherals. For Embedded Software high end M parts or A parts would be just fine. For bare metal work there are some small pin count M parts which can be adapted to a DIP pinouts but it is more challenging to use ARM for that type of course.

IDE, compiler & debugger

I am looking at using the uVision IDE from Keil (owned by ARM) because:

It is compatible with a very wide range of devices from many manufacturers so you are not tied to any one single company. This will allow a lot of reuse of notes if the target microcontroller is withdrawn or updated
It has compilers, assemblers, and on chip debugging facilities
There is the essential free version for anyone with code size limits which are well under anything you are likely to need - 32kb of code max (MDK-ARM Lite). [ARM typically have always supported universities strongly so a donation of the full version for internal use in teaching and research may be very possible - talk to them]
Keil tools are also pretty well behaved as windows programs and receive regular updates to fix bugs. Other IDEs I have used don't even regard some problems as bugs at all!
Keil donations are accessible via ARM which doesn't have the sale of development software as its core business with the implications for donations.

This IDE will support both courses quite happily as well as scaling to include larger student projects. This will reduce the amount of repeated documentation needed across the two courses as we will be supporting one IDE.

Development kits

The two courses have quite different requirements. One course needs two boards:

One for the students to own by themselves - cost is a very important consideration
One with a large amount of IO for the labs and direct access to a memory mapped IO space
Both should not have too many built in demonstration hardware (it pushes up the cost and wastes valuable IO capability)

The other course needs a board with native USB connectivity and as much raw access to IO as possible. Both courses need all the boards to be compatible with the Keil uVision IDE/compiler/debugger suite.
Resources to help find suitable ARM development platforms can be found on the university program section of ARM's website. Here are the highlights of the development boards I have considered:

Low cost student owned development board

The winner: ST STM32VL-Discovery

Micro is the ST STM32F100RB, a Cortex-M3 running at 24MHz with 128kb of flash and 8kb of RAM
On chip peripherals: 1x ADC, 2x DAC, Timers, 2x I2C, 3x USART, 2x SPI and something called CEC. There is also a DMA unit
The programmer/debugger is an ST-Link built onto the top section of the board. It can also be used to program and debug other ST STM32 ARM based microcontrollers using the ARM Serial Wire Debug (SWD) bus
The ST-Link is Keil uVision compatible for programming and on chip debugging
$12 as of the 11^th of July from DigiKey

Honourable mention: NXP LPCXpresso

LPCXpresso is both an IDE (powered by code_red technology) and a set of development boards for NXP's LPC ARM based microcontrollers
Supports various LPC families of microcontrollers based on the ARM Cortex-M3 and simpler Cortex-M0
Built onto a (one time detachable) NXP LPC-Link programmer/debugger that is supported by the code_red based IDE
$29.95 as of the 11^th of July from DigiKey. Note the variety of parts available as well as more complex and higher cost LPCXpresso compatible boards

For consideration: ARM NXP mbed

Needs no special programming hardware or any locally installed software as it is programmed from the mbed website using a web browser. Appears to the PC as a USB stick where if you place a file on it and press a button it programs itself
Seriously limited for my purposes by the lack of On Chip Debugging and simplified programming system [NB not due to the NXP micro but due to the requirement to not need locally installed software]
This was not a serious candidate for the level of education that I wish to engage in however is a very serious player for courses in high school or the first year of University
Does not come with a "standard" programmer/debugger
$60 as of the 11^th of July from DigiKey

Lab board with native USB connectivity and external memory interface
[NB This board is a Keil product due to my relationship with ARM, not that there are not excellent candidates from other providers.]
[NNB Keil does make excellent boards though!]

The winner: Keil MCB9B500 Evaluation Board

Micro is the Fujitsu FM3 MB9BF506, a Cortex-M3 running at 80MHz with 512kb of flash and 64kb of RAM
Full physical access to each and every pin on the device
On chip peripherals: USB2.0 Device and Host, 2x CAN, 8 channel DMA, External Bus IF supporting 8/16 bit SRAM, NOR and NAND flash with up to 8 chip selects, 8x USART/CSIO/LIN/I2C Serial ports, 8x Timers, 2x Multi-Function Timers, CRC Accelerator and 3x 16 channel ADCs
Apart from a few LEDs, switches, one potentiometer and the USB device and host port all the rest of the IO is accessible via the two fantastic 0.1" pitch dual row sockets. 0.1" pitch headers are the best educational header being sufficiently small to be convenient but strong to take abuse and rough handling
Programmed and debugged by the ULINK-ME programmer from Keil (not shown). [NB This is not available for general purchase, only with new kits so be sure to make sure it is included]
Compatible with uVision IDE from Keil [NB Obvious, perhaps]
$100 as of the 11^th of July from DigiKey (I haven't linked it here as DigiKey doesn't seem to sell the version with the bundled ULINK-ME but I can't be sure. Buyer Beware)

[All pictures are copyright their respective owners and are reproduced here for convenience. For owner information consult the images ALT text. ARM, Cortex, Keil, uVision, mbed, LPC, ST-Link, LPC-Link, Fujitsu FM3 and ST Discovery are probably all trademarks of their respective companies.]