Wednesday, March 16, 2011
I have moved to Purdue in the USA!
Following on from a madly busy time at Southampton, UK, I am now working for Purdue University in the USA, which should be a lot of fun!
Tuesday, June 29, 2010
The future of computing? The SpiNNaker million processor computer
The future of computing is a very big claim. However if anything the future of computing does not lie in a common time, unified shared memory system - this pretty well describes every multicore computer made at the moment. The trouble is once you go to more than a few thousand cores the shared memory - shared time concept falls to bits. It just doesn't scale. Where then should we look for a new model other than biology? The brains of living creatures are capable of phenomenal processing power and yet have almost none of the features of the computers we build today. Don't you find this odd?
Lets look at how brains do it:
So clearly modern computing bears as relation to a brain as sudoku does to quantum mechanics. The closest we get are the datacenters of a company such as Google but they enjoy far too much connectivity to be a good model.
Enter the world of academia because no commercial company would be stupid enough to commit the resources we can into a project like this (and all the ones before it which make it possible) - it is unlike anything that has come before and it is risky (i.e. it may not work!).
Project SpiNNaker
This revolutionary idea is nothing less than a plan to put together a computer of approximately 1 million cores with no common clock or shared memory and which can route messages with a model that approximates a neurological system.
Born in the University of Manchester's Advanced Processor Technologies Group the SpiNNaker project has taken shape in collaboration with the University of Southampton's School of Electronics and Computer Science (where I work), the Engineering and Physical Science Research Council and two enlightened commercial companies: ARM (for the processor IP) and SilĂstix (for their Network on Chip expertise).
SpiNNaker is a child of many parents, each a vital step on the path to it's genesis:
A seriously ambitious project you say? Absolutely right. Vapourware or a dream? Hell no! Feast your eyes on this:
This is the first generation of SpiNNaker SoCs on a test board - there are four dual processor SoCs on this board each next to its accompanying 1Gbit ram chip (click on the picture for a very high resolution version). This board already holds 8 processors asynchronously interconnected with all the necessary debug hardware to perfect the design of the next generation.
Who ever said British Science was dull?
BIG DISCLAIMER: I am not one of the great minds trying to change the world with this project. At best I have helped a couple of people around the edges. I am a very big fan, however!
Lets look at how brains do it:
- Does each neuron have a sense of the passage of time? Not in the traditional sense and maybe not at all. If it does it will probably only be only vague ordering.
- Does a neuron share a memory space with every other neuron? No. The closest concept is an area effect of diffusion of chemicals which affect neighbours (which may not be synaptically connected) but that is very far from a shared memory concept.
- Is each neuron connected to every other neuron? They have many connections to close neurons and some connections to neurons further away best described by a statistical distribution but otherwise, again, no.
So clearly modern computing bears as relation to a brain as sudoku does to quantum mechanics. The closest we get are the datacenters of a company such as Google but they enjoy far too much connectivity to be a good model.
Enter the world of academia because no commercial company would be stupid enough to commit the resources we can into a project like this (and all the ones before it which make it possible) - it is unlike anything that has come before and it is risky (i.e. it may not work!).
Project SpiNNaker
This revolutionary idea is nothing less than a plan to put together a computer of approximately 1 million cores with no common clock or shared memory and which can route messages with a model that approximates a neurological system.
Born in the University of Manchester's Advanced Processor Technologies Group the SpiNNaker project has taken shape in collaboration with the University of Southampton's School of Electronics and Computer Science (where I work), the Engineering and Physical Science Research Council and two enlightened commercial companies: ARM (for the processor IP) and SilĂstix (for their Network on Chip expertise).
SpiNNaker is a child of many parents, each a vital step on the path to it's genesis:
- AsipIDE GALS Design and Co-Simulation Framework - A hardware/software co-design and debugging framwork
- Transactional Memory - A new locking strategy which massively simplifies the ability to lock data to pass information between processes
- TERAFLUX: Exploiting Dataflow Parallelism in Teradevice Computing - A pan-european project looking at computers with massive numbers of cores
- The Balsa Asynchronous Synthesis System - A language and compiler targetting asynchronous, handshake driven logic design
A seriously ambitious project you say? Absolutely right. Vapourware or a dream? Hell no! Feast your eyes on this:
This is the first generation of SpiNNaker SoCs on a test board - there are four dual processor SoCs on this board each next to its accompanying 1Gbit ram chip (click on the picture for a very high resolution version). This board already holds 8 processors asynchronously interconnected with all the necessary debug hardware to perfect the design of the next generation.
Who ever said British Science was dull?
BIG DISCLAIMER: I am not one of the great minds trying to change the world with this project. At best I have helped a couple of people around the edges. I am a very big fan, however!
Thursday, April 22, 2010
A fixed OpenSPARC T2 build for Design Compiler 2009....
Just a very quick note to say that we have fixed the compile problems for the OpenSPARC T1 processor when building it with modern versions of Design Compiler (any > 2007).
More information, including sizes and speeds on the Synopsys 90nm EDK to follow. Bear in mind it is missing the PLL and a couple of other small modules.
We will, of course, push this upstream.
More information, including sizes and speeds on the Synopsys 90nm EDK to follow. Bear in mind it is missing the PLL and a couple of other small modules.
We will, of course, push this upstream.
Wednesday, March 31, 2010
Current activity with the EVE ZeBu Hardware/Software Co-Verification Environment
The title is a fancy way of referring to the EVE ZeBu accelerators (much more information on their website). I previously posted on our acquisition of a UF-2 (I should mention we liked it so much we now have 2!) so I am taking a moment to show what we are up to with this wonderful technology. I have broken it into research and teaching topics.
Research
Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL
The investigation of neuron structures is an incredibly difficult and complex task that yields relatively low rewards in terms of information from biological forms (either animals or tissue). The structures and connectivity of even the simplest invertebrates are almost impossible to establish with standard laboratory techniques, and even when this is possible it is generally time consuming, complex and expensive. Recent work has shown how a simplified behavioural approach to modelling neurons can allow “virtual” experiments to be carried out that map the behaviour of a simulated structure onto a hypothetical biological one, with correlation of behaviour rather than underlying connectivity. The problems with such approaches are numerous. The first is the difficulty of simulating realistic aggregates efficiently, the second is making sense of the results and finally, it would be helpful to have an implementation that could be synthesised to hardware for acceleration. In this paper we present a VHDL implementation of Neuron models that allow large aggregates to be simulated. The models are demonstrated using a synthesizable system level VHDL model of the C. Elegans locomotory system.
The role of the EVE in this specific project is verifying and executing the largest of the neural net models functionality using your cosimulation replacing previous, limited, technology based around a FPGA board using a probe program.
Publications
Bailey, J., Wilson, P., Brown, A. and Chad, J. (2008) Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL. In: BMAS. (In Press)
Bailey, J., Wilson, P. R., Brown, A. D. and Chad, J. (2007) Behavioural Simulation of Biological Neuron Systems using VHDL and VHDL-AMS. In: IEEE Behavioural Modeling and Simulation, Sep 2007, San Jose, USA. pp. 153-158.
Architectures for Numerical Computation
Since the 1960s, the observation that has become known as Moore’s Law has become a self-fulfilling prophecy. Processing power doubles every two years because of the advances in CMOS technology. There are clear signs, however, that these technological advances are coming to an end. The eco- nomics of pushing CMOS technology to its physical limits will eventually halt further development.
If it is no longer feasible to increase computing power through smaller, faster transistors, the al- ternative is massive parallelism. This progression is already apparent. Multi-core and multi-threaded processors are now common. Although modern operating systems are able to use multiple cores, with few exceptions, programs are confined to single cores. The challenge facing software engineers is to make best use of multiple cores.
A significant amount of processing power is concerned with numerical computation. Consumer applications, such as image and audio processing are fundamentally numerical. Similarly, engineering applications, such as simulation and optimization rely on numerical calculations. At this point, we should distinguish between consumer and desktop applications and High-Performance Computing (HPC) tasks that rely on clusters of dedicated processors. It is not our intention to move into the HPC world at this time.
While using multiple cores can accelerate many numerical algorithms, far greater speed-up would be possible using more specialized forms of hardware, such as GPUs and FPGAs. A further consideration is power consumption (and the related problem of heat dissipation). Custom hardware can reduce power consumption by an order of magnitude or more. The key, of course, is to use the resources in the best possible way. In the context of the work proposed here, there are two aspects to this problem. First, we need to make the best division between hardware and software and second, we need to design an appropriate overall architecture.
The obvious role of the EVE platform in this research field is to support the research into specific computing pipelines, fine grain computation blocks and architectures as well as enabling the development of some 16 lane PCIe computation accelerators.
Teaching
Hosting a Complex SoC on the EVE Platform
This project was a proof of concept and de-risking of the EVE transactor flow using a large SoC. The SoC in question was chosen to be the Gaisler-Aeroflex LEON3 (http://www.gaisler.com). The LEON3 SoC is based around a SPARC v8 compatible CPU and is written in VHDL. The minimal LEON3 SoC was built using the EVE support for memory to model the processor cache along with the transactors for DRAM and UARTs. This work will be extended to include the VGA, Ethernet and USB transactors on the hardware side it and include support for the Snapgear Linux version for the LEON3.
Verification of a highly integrated ASIC
A large masters level project framework will produce a heterogeneous multicore ASIC to perform processing on HD video data streams. It will bundle an 32 bit microcontroller core, on chip SRAM, our custom geometric processor along with a multilayer AMBA bus architecture optimised for power and contention. The EVE will be an invaluable support to the simulation and verification of the final design before it is sent for manufacture.
So there you have it - quite a lot going on, all of which is really fascinating and fun!
Research
Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL
The investigation of neuron structures is an incredibly difficult and complex task that yields relatively low rewards in terms of information from biological forms (either animals or tissue). The structures and connectivity of even the simplest invertebrates are almost impossible to establish with standard laboratory techniques, and even when this is possible it is generally time consuming, complex and expensive. Recent work has shown how a simplified behavioural approach to modelling neurons can allow “virtual” experiments to be carried out that map the behaviour of a simulated structure onto a hypothetical biological one, with correlation of behaviour rather than underlying connectivity. The problems with such approaches are numerous. The first is the difficulty of simulating realistic aggregates efficiently, the second is making sense of the results and finally, it would be helpful to have an implementation that could be synthesised to hardware for acceleration. In this paper we present a VHDL implementation of Neuron models that allow large aggregates to be simulated. The models are demonstrated using a synthesizable system level VHDL model of the C. Elegans locomotory system.
The role of the EVE in this specific project is verifying and executing the largest of the neural net models functionality using your cosimulation replacing previous, limited, technology based around a FPGA board using a probe program.
Publications
Bailey, J., Wilson, P., Brown, A. and Chad, J. (2008) Behavioural Simulation and Synthesis of Biological Neuron Systems using VHDL. In: BMAS. (In Press)
Bailey, J., Wilson, P. R., Brown, A. D. and Chad, J. (2007) Behavioural Simulation of Biological Neuron Systems using VHDL and VHDL-AMS. In: IEEE Behavioural Modeling and Simulation, Sep 2007, San Jose, USA. pp. 153-158.
Architectures for Numerical Computation
Since the 1960s, the observation that has become known as Moore’s Law has become a self-fulfilling prophecy. Processing power doubles every two years because of the advances in CMOS technology. There are clear signs, however, that these technological advances are coming to an end. The eco- nomics of pushing CMOS technology to its physical limits will eventually halt further development.
If it is no longer feasible to increase computing power through smaller, faster transistors, the al- ternative is massive parallelism. This progression is already apparent. Multi-core and multi-threaded processors are now common. Although modern operating systems are able to use multiple cores, with few exceptions, programs are confined to single cores. The challenge facing software engineers is to make best use of multiple cores.
A significant amount of processing power is concerned with numerical computation. Consumer applications, such as image and audio processing are fundamentally numerical. Similarly, engineering applications, such as simulation and optimization rely on numerical calculations. At this point, we should distinguish between consumer and desktop applications and High-Performance Computing (HPC) tasks that rely on clusters of dedicated processors. It is not our intention to move into the HPC world at this time.
While using multiple cores can accelerate many numerical algorithms, far greater speed-up would be possible using more specialized forms of hardware, such as GPUs and FPGAs. A further consideration is power consumption (and the related problem of heat dissipation). Custom hardware can reduce power consumption by an order of magnitude or more. The key, of course, is to use the resources in the best possible way. In the context of the work proposed here, there are two aspects to this problem. First, we need to make the best division between hardware and software and second, we need to design an appropriate overall architecture.
The obvious role of the EVE platform in this research field is to support the research into specific computing pipelines, fine grain computation blocks and architectures as well as enabling the development of some 16 lane PCIe computation accelerators.
Teaching
Hosting a Complex SoC on the EVE Platform
This project was a proof of concept and de-risking of the EVE transactor flow using a large SoC. The SoC in question was chosen to be the Gaisler-Aeroflex LEON3 (http://www.gaisler.com). The LEON3 SoC is based around a SPARC v8 compatible CPU and is written in VHDL. The minimal LEON3 SoC was built using the EVE support for memory to model the processor cache along with the transactors for DRAM and UARTs. This work will be extended to include the VGA, Ethernet and USB transactors on the hardware side it and include support for the Snapgear Linux version for the LEON3.
Verification of a highly integrated ASIC
A large masters level project framework will produce a heterogeneous multicore ASIC to perform processing on HD video data streams. It will bundle an 32 bit microcontroller core, on chip SRAM, our custom geometric processor along with a multilayer AMBA bus architecture optimised for power and contention. The EVE will be an invaluable support to the simulation and verification of the final design before it is sent for manufacture.
So there you have it - quite a lot going on, all of which is really fascinating and fun!
Wednesday, March 17, 2010
EWME2010 and Synopsys' EDK
Firstly EWME 2010 is coming in May. It is the European Workshop on Microelectronic Education which along with its sister conference in the USA, Microelectronic Systems Education (MSE), is pretty much unique in the world.
If you need help or advice on starting a programme at your institution then this is a great opportunity to meet a very cohesive, dedicated group of educators.
EWME and MSE are characterised by fair and honest discussions on the challenges and tools available to educators in this sphere, along with a showcase of our best student projects and modules. Papers detail how to run the course administratively as well as the practical aspects. We also get a few vendors who are always pleased to get you into contact with the appropriate University Programmes.
Secondly the Synopsys 90nm EDK and our experiences with it. We have been users of this EDK for about a year now and it forms a great basis for working with our students. Partially because of the established notes and curriculum resources make it very quick to get off the ground with several complex areas. Compared to the IDESA Advanced Digital Physical Implementation Flow course which promotes the TSMC 90nm deisgn kit this one is far more student friendly with a lower startup cost (it also doesn't require attendance to get the notes and access to the labs). [However for research the IDESA course is a fantastic introduction to 90nm and below.]
See you at EWME 2010 in Darmstadt!
If you need help or advice on starting a programme at your institution then this is a great opportunity to meet a very cohesive, dedicated group of educators.
EWME and MSE are characterised by fair and honest discussions on the challenges and tools available to educators in this sphere, along with a showcase of our best student projects and modules. Papers detail how to run the course administratively as well as the practical aspects. We also get a few vendors who are always pleased to get you into contact with the appropriate University Programmes.
Secondly the Synopsys 90nm EDK and our experiences with it. We have been users of this EDK for about a year now and it forms a great basis for working with our students. Partially because of the established notes and curriculum resources make it very quick to get off the ground with several complex areas. Compared to the IDESA Advanced Digital Physical Implementation Flow course which promotes the TSMC 90nm deisgn kit this one is far more student friendly with a lower startup cost (it also doesn't require attendance to get the notes and access to the labs). [However for research the IDESA course is a fantastic introduction to 90nm and below.]
See you at EWME 2010 in Darmstadt!
Wednesday, January 27, 2010
IBM PowerPC™ 405-S: Verification of the Gate Level Netlist with the supplied Artisan SRAMs
[Please note that these instructions reference the 1.00a version of the Synopsys PowerPC™ 405-S coreKit which is the version provided in the University IP package from IBM. There are updated versions available from Synopsys.]
This is the fourth article in this series (click here for the series index) describing the IBM PowerPC™ 405 synthesizable core and peripherals and will cover functionally and formally verifying the synthesized gate-level structural Verilog netlist.
Firstly let us look at the two main classes of verification:
Luckily for us the coreKit provides scripts and support for automating both activities. The completed netlist (with scan chains, more on their importance later) can be put into the same simulation framework that verifies the RTL is fully functional. The completed netlist can also be formally verified against the source RTL by using Synopsys Formality™.
Formal Verification
The relevant part of the instruction manual (docs/405_ivug.pdf) is:
Chapter 9: Gate-Level Verification
So for the following process assume you have used this series of articles to synthesize a PowerPC™ 405-S core with scan chains using the Artisan PDK and the files in the list below are in their correct locations.
You will need the following:
From the ./fm directory in the current workspace:
Output:
Assuming you entered the command correctly Formality™ should start and the process will begin by synthesising the original RTL. This will produce a large number of warnings but in this case this is normal and expected.
It then sets a few constants disabling the scan mode in the design:
It now checks the designs and issues warnings for the RAM models.
[A “black box” is the term for a section or module of the circuit that is not visible to the tool so it is warning that it cannot verify the contents.]
And finally compares the design at the matching points, producing this worrying looking output:
Functional Verification
This is described in the same section of the manual as Formal Verification. The prerequisite is a fully functioning RTL verification environment - if it didn't work on the RTL it isn't going to on the netlist.
The relevant files are:
This is the key script which needs a couple of modifications to simulate the netlist and not the RTL. It is a list of parameters organised into blocks which are read by the Perl script scripts/runTB. scripts/runTB then builds and executes the correct command line to run a particular simulation in your chosen simulator.
Open the script scripts/run405.config in your favourite text editor and look for the second block:
As you can see a 'block' has a header identifying it and is closed by an end keyword. The modification to this block is to comment out the ./../src/rtl by adding a # in front of the line like so:
By doing this we exclude the unsynthesised RTL from being discovered by the simulator which is what we want. The netlist should contain the complete circuit.
Take a look at the third block:
In this block we need to explicitly tell the simulator to compile the simulation models for the Artisan PDK cells from the technology library, the synthesised netlist itself and a wrapper called PPC405F5V1_soft.v:
and finally we need to add the GATE_SIM variable already mentioned above. This makes the testbench behave slightly differently. When simulating the RTL the testbench forces the state of certain internal registers to 0 at the start of every run. With a gate-level netlist those registers may no longer exist, and certainly may not exist with the same name or hierarchy. To achieve the same effect when the GATE_SIM variable is defined at the start of each run the testbench places the chip into scan mode and clocks 0's through the scan chains forcing all flops to have a starting value of 0.
The correct place to set GATE_SIM depends on your simulator so have a look at the blocks in the bottom of this file:
These two example blocks (there are others) refer to Mentor Graphic's ModelSim (mtioptions) and Cadence's NC Verilog (ncoptions). To add the define change the line to read as follows:
You should now be ready to fire up coreConsultant and begin.
[Note: You can run the tests outside of coreConsultant but it won't generate the nice, navigable HTML reports. To do this use the command runtest from the sim directory.]
Changes in coreConsultant
1) Run core consultant by running the command coreConsultant in a terminal window.
2) Open the workspace you created to run the RTL functional verification from the previous article and you should see the window below:
Don't press 'Apply' until the last step at the bottom or it will start the simulation immediately and there are other settings to change!
3) Ensure the settings are correct in the 'Simulator' selection before moving on to 'Select Test Options'
4) In 'Select Test Options' choose Run_Gate_Sims. Move onto 'Select Testsuite'.
5) In 'Select Testsuite' set the path to the Netlist and the simulation library:
6) You are all done. Click 'Apply' to start the simulations
The results of the running simulations are visible in the test.log file which can be monitored in realtime by executing the following command in a terminal window from the sim directory:
tail -f test.log
Studying this file you should see the following (for NC Verilog but each simulator will issue warnings of this kind):
During the compilation you will get a lot of complaints about unconnected nets. This is only sensible as the individual cells from the cell library have a fixed number of outputs, especially the flops, that cannot be removed. They come with the cell:
After the compilation things should settle down into a format similar to this, one per test:
which are found after the last Vera module loads at the start of the simulation. This is the effect of the GATE_SIM define added earlier and is showing the activation of the scan chains to zero the flops internally.
And finally you should be greeted with a report in coreConsultant that looks like this:
There, you have now formally and functionally verified your very own PowerPC™ 405-S netlist. All it needs is to be placed and routed (and finally functionally and formally verified again) and you are good to go!
Published with permission from IBM and Synopsys
This is the fourth article in this series (click here for the series index) describing the IBM PowerPC™ 405 synthesizable core and peripherals and will cover functionally and formally verifying the synthesized gate-level structural Verilog netlist.
Firstly let us look at the two main classes of verification:
- Formal Verification
- Functional Verification
Luckily for us the coreKit provides scripts and support for automating both activities. The completed netlist (with scan chains, more on their importance later) can be put into the same simulation framework that verifies the RTL is fully functional. The completed netlist can also be formally verified against the source RTL by using Synopsys Formality™.
Formal Verification
The relevant part of the instruction manual (docs/405_ivug.pdf) is:
Chapter 9: Gate-Level Verification
So for the following process assume you have used this series of articles to synthesize a PowerPC™ 405-S core with scan chains using the Artisan PDK and the files in the list below are in their correct locations.
You will need the following:
- A synthesized gate level netlist in structural verilog (typically ./dc/netlist/PPC405F5V1_dft.v)
- Your technology library in Synopsys DB form (if you are following the supplied Artisan flow with SRAMs it is ./tech_lib/artisan_13lvfsg/syn/slow.db)
- Synopsys Formality™, Synopsys' premier formal equivalence checker
From the ./fm directory in the current workspace:
./run_fm.gate ../src/ ../tech_lib/artisan_13lvfsg/syn/slow.db ../dc/netlist/PPC405F5V1_dft.v $SYNOPSYS
Output:
Assuming you entered the command correctly Formality™ should start and the process will begin by synthesising the original RTL. This will produce a large number of warnings but in this case this is normal and expected.
It then sets a few constants disabling the scan mode in the design:
set_constant ref:/WORK/$MODULE/TESTC405SCANENABLE -type port 0 Set 'ref:/WORK/PPC405F5V1/TESTC405SCANENABLE' to constant 0 1 set_constant impl:/WORK/$MODULE/TESTC405SCANENABLE -type port 0 Set 'impl:/WORK/PPC405F5V1/TESTC405SCANENABLE' to constant 0 1
It now checks the designs and issues warnings for the RAM models.
[A “black box” is the term for a section or module of the circuit that is not visible to the tool so it is warning that it cannot verify the contents.]
Status: Checking designs... Warning: Design ref:/DATARAM_64X34/dataram_64X34 is a black box and there are cells referencing it (FM-160) Warning: Design ref:/SRAM256X46/sram256x46 is a black box and there are cells referencing it (FM-160) Warning: Design ref:/SRAMBYTWR512X128/sramBytWr512x128 is a black box and there are cells referencing it (FM-160) Warning: Design ref:/SRAM512X8/sram512x8 is a black box and there are cells referencing it (FM-160)
And finally compares the design at the matching points, producing this worrying looking output:
Status: Verifying... .... Compare point C405TESTSCANOUT0 failed (is not equivalent) Compare point C405TESTSCANOUT1 failed (is not equivalent) Compare point C405TESTSCANOUT2 failed (is not equivalent) Compare point C405TESTSCANOUT3 failed (is not equivalent) Compare point C405TESTSCANOUT4 failed (is not equivalent) Compare point C405TESTSCANOUT5 failed (is not equivalent) Compare point C405TESTSCANOUT6 failed (is not equivalent) . Compare point C405TESTSCANOUT7 failed (is not equivalent) . ********************************* Verification Results ********************************* Verification FAILED ATTENTION: RTL interpretation messages were produced during link of reference design. Verification results may disagree with a logic simulator. ATTENTION: 8 failing compare points have unmatched undriven signals in their reference fan-in. To report such failing points, use "report_failing_points -inputs unmatched -inputs undriven". 8 such failing compare points are directly undriven primary output ports. To report directly undriven failing primary output ports, use "report_failing_points -point_type directly_undriven_output". To suppress verification of directly undriven primary output ports, use "set_dont_verify_point -directly_undriven_output". To read about undriven signal handling, use "man verification_set_undriven_signals". ---------------------------------------------------------------------------------------- Reference design: ref:/WORK/PPC405F5V1 Implementation design: impl:/WORK/PPC405F5V1 16662 Passing compare points 8 Failing compare points 0 Aborted compare points 0 Unverified compare points ---------------------------------------------------------------------------------------- Matched Compare Points BBPin Loop BBNet Cut Port DFF LAT TOTAL ---------------------------------------------------------------------------------------- Passing (equivalent) 1204 0 0 0 578 14679 201 16662 Failing (not equivalent) 0 0 0 0 8 0 0 8 Not Compared Clock-gate LAT 1 1 **************************************************************************************** Info: Formality Guide Files (SVF) can improve verification success by automating setup. 0
However, don't be downhearted! All Formality™ has done is its job and spotted a difference: the scan chains you inserted during the compile are not present in the source RTL. So ignoring the scan chains, as far as the tool can tell, this is the design you wanted the synthesis tool to manufacture!
Functional Verification
This is described in the same section of the manual as Formal Verification. The prerequisite is a fully functioning RTL verification environment - if it didn't work on the RTL it isn't going to on the netlist.
The relevant files are:
- sim/testbench/p405s_test_top.v - this is the Verilog file containing the top level testbench. It contains a very useful parameter: parameter simulation_cycle = 100; near the top of the file which sets the whole simulation speed in terms of the clock applied to the CPU (not the PLB). It is also sensitive to a parameter called GATE_SIM. This changes the behavior of the testbench to accommodate the gate-level netlist
- sim/scripts/run405.config - this file contains the parameters of the simulation "run" about to take place and needs editing to support the gate-level netlist simulation
This is the key script which needs a couple of modifications to simulate the netlist and not the RTL. It is a list of parameters organised into blocks which are read by the Perl script scripts/runTB. scripts/runTB then builds and executes the correct command line to run a particular simulation in your chosen simulator.
Open the script scripts/run405.config in your favourite text editor and look for the second block:
searchpath ./../src/rtl ./../src/mem_models $SYNOPSYS/packages/gtech/src_ver $SYNOPSYS/dw/sim_ver ./testbench ./vera/ver_shell end
As you can see a 'block' has a header identifying it and is closed by an end keyword. The modification to this block is to comment out the ./../src/rtl by adding a # in front of the line like so:
searchpath #./../src/rtl ./../src/mem_models $SYNOPSYS/packages/gtech/src_ver $SYNOPSYS/dw/sim_ver ./testbench ./vera/ver_shell end
By doing this we exclude the unsynthesised RTL from being discovered by the simulator which is what we want. The netlist should contain the complete circuit.
Take a look at the third block:
differentfile ./../src/rtl/p405s_params.v end
In this block we need to explicitly tell the simulator to compile the simulation models for the Artisan PDK cells from the technology library, the synthesised netlist itself and a wrapper called PPC405F5V1_soft.v:
differentfile ./../src/rtl/p405s_params.v ./../src/rtl/PPC405F5V1_soft.v ./../dc/netlist/PPC405F5V1_dft.v ./../tech_lib/artisan_13lvfsg/verilog/tsmc13_hs_modified_new.v end
and finally we need to add the GATE_SIM variable already mentioned above. This makes the testbench behave slightly differently. When simulating the RTL the testbench forces the state of certain internal registers to 0 at the start of every run. With a gate-level netlist those registers may no longer exist, and certainly may not exist with the same name or hierarchy. To achieve the same effect when the GATE_SIM variable is defined at the start of each run the testbench places the chip into scan mode and clocks 0's through the scan chains forcing all flops to have a starting value of 0.
The correct place to set GATE_SIM depends on your simulator so have a look at the blocks in the bottom of this file:
# This is a string appended to the MTI invocations mtioptions +define+UNITSIM +nospecify +define+GATE_SIM end # This is a string appended to the NC invocations ncoptions +define+UNITSIM +define+SYN_RTL +nospecify end
These two example blocks (there are others) refer to Mentor Graphic's ModelSim (mtioptions) and Cadence's NC Verilog (ncoptions). To add the define change the line to read as follows:
# This is a string appended to the NC invocations ncoptions +define+UNITSIM +define+SYN_RTL +nospecify +define+GATE_SIM end
You should now be ready to fire up coreConsultant and begin.
[Note: You can run the tests outside of coreConsultant but it won't generate the nice, navigable HTML reports. To do this use the command runtest from the sim directory.]
Changes in coreConsultant
1) Run core consultant by running the command coreConsultant in a terminal window.
2) Open the workspace you created to run the RTL functional verification from the previous article and you should see the window below:
Don't press 'Apply' until the last step at the bottom or it will start the simulation immediately and there are other settings to change!
3) Ensure the settings are correct in the 'Simulator' selection before moving on to 'Select Test Options'
4) In 'Select Test Options' choose Run_Gate_Sims. Move onto 'Select Testsuite'.
5) In 'Select Testsuite' set the path to the Netlist and the simulation library:
6) You are all done. Click 'Apply' to start the simulations
The results of the running simulations are visible in the test.log file which can be monitored in realtime by executing the following command in a terminal window from the sim directory:
tail -f test.log
Studying this file you should see the following (for NC Verilog but each simulator will issue warnings of this kind):
During the compilation you will get a lot of complaints about unconnected nets. This is only sensible as the individual cells from the cell library have a fixed number of outputs, especially the flops, that cannot be removed. They come with the cell:
TLATNX8HS LOCKUP1 ( .D(dp_scReg_icuStatus1_regL2[5]), .GN(sc), .Q(scan_out2) | ncelab: *W,CUVWSP (../dc/netlist/PPC405F5V1_dft.v,123147|18): 1 output port was not connected: ncelab: (../tech_lib/artisan_13lvfsg/verilog/tsmc13_hs_modified_new.v,28144): QN
After the compilation things should settle down into a format similar to this, one per test:
ncsim: 08.10-s006: (c) Copyright 1995-2008 Cadence Design Systems, Inc. Loading snapshot worklib.p405s_test_top:v .................... Done === Verilog with Synopsys Vera === ncsim> source /home/esdcad/software/cadence/linux/ius81/tools/inca/files/ncsimrc ncsim> run ++---------------------------------------------------------------------++ || VERA System Verifier (TM) || || Version: A-2007.12 () -- Wed Dec 9 17:15:27 2009 || || Copyright (c) 1995-2004 by Synopsys, Inc. || || All Rights Reserved || || || || For support, send email to vera-support@synopsys.com || || || || This software and the associated documentation are confidential || || and proprietary to Synopsys Inc. Your use or disclosure of this || || software is subject to the terms and conditions of a written || || license agreement between you, or your company, and Synopsys, Inc. || ++---------------------------------------------------------------------++ Vera: Loading main "p405s_top_vera" (path = "p405s_test_top.top_vera") Vera: Loading ../../vera/lib/p405s_top.vro.. Vera: Loading ../../vera/lib/p405s_memory.vro.. Vera: Loading ../../vera/lib/p405s_slave.vro.. Vera: Loading main "p405s_dcrmon" (path = "p405s_test_top.dcrmon_model") Vera: Loading ../../vera/lib/p405s_dcrmon.vro.. Vera: Loading main "p405s_dcr" (path = "p405s_test_top.dcr_model") Vera: Loading ../../vera/lib/p405s_dcr.vro.. Vera: Loading main "p405s_isocm" (path = "p405s_test_top.isocm_model") Vera: Loading ../../vera/lib/p405s_isocm.vro.. Vera: Loading main "p405s_dsocm" (path = "p405s_test_top.dsocm_model") Vera: Loading ../../vera/lib/p405s_dsocm.vro.. Vera: Loading main "p405s_intr_ctrl" (path = "p405s_test_top.intr_controller") Vera: Loading ../../vera/lib/p405s_intr_ctrl.vro.. Vera: Loading main "p405s_jtag" (path = "p405s_test_top.jtag_controller") Vera: Loading ../../vera/lib/p405s_jtag.vro.. Vera: Loading main "p405s_monitor" (path = "p405s_test_top.monitor_slave_side") Vera: Loading ../../vera/lib/p405s_monitor.vro.. >> Setting TEST_c405ScanEnable = 1 << Entering the PLB Initialization sequence time=0 Entering Reset delay sequence time=1000 Completing Reset delay sequence time=2999000 >> Setting TEST_c405ScanEnable = 0 << >> Forcing Reset of D-CACHE Tags << 3002000 info: Slave0: is in Reset 3003000 info: Slave0: is in Reset 3004000 info: Slave0: is in Reset 3005000 info: Slave0: is in Reset 3006000 info: Slave0: is in Reset 3007000 info: Slave0: is in Reset 3008000 info: Slave0: is in Reset 3009000 info: Slave0: is in Reset 3010000 info: Slave0: is in Reset 3011000 info: Slave0: is in Reset 3012000 info: Slave0: is in Reset 3013000 info: Slave0: is in Reset >> Forcing Reset of D-CACHE Tags << DCRMON (3797000): DCR Write : Addr = 44 WData = bfff0000 DCRMON (3907000): DCR Write : Addr = 0 WData = 0 DCRMON (3922000): DCR Write : Addr = 11 WData = 0 DCRMON (3977000): DCR Write : Addr = 8 WData = 0 DCRMON (3997000): DCR Write : Addr = 1 WData = 0 DCRMON (4001000): DCR Write : Addr = 9 WData = 0 DCRMON (4005000): DCR Write : Addr = a WData = 0 DCRMON (4014000): DCR Write : Addr = 20 WData = 0 DCRMON (4028000): DCR Write : Addr = 3 WData = 0 DCRMON (4109000): DCR Write : Addr = b WData = 0 DCRMON (4113000): DCR Write : Addr = c WData = 0 6488000 info: Slave0: MasterId=1 Request for Write addr=00000010 6488000 info: Slave0: 1 to MultiByte Write TESTCASE took 3474 Clocks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TESTCASE "addfvt" PASSED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Vera: finish encountered at time 6493000 cycle 6494 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 10739 expect: 0 sample: 49891 sync: 16493 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 6493 expect: 0 sample: 3199836 sync: 607518 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 19494 expect: 0 sample: 64898 sync: 32462 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 6517 expect: 0 sample: 32469 sync: 19482 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 3882 expect: 0 sample: 56947 sync: 12987 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 11903 expect: 0 sample: 43549 sync: 12987 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 6737 expect: 0 sample: 33265 sync: 13564 Vera: finish encountered at time 6494000 cycle 6495 total mismatch: 0 vca_error: 0 fail(expected): 0 drive: 1 expect: 0 sample: 140 sync: 108 TESTCASE "addfvt" PASSEDYou can see the Vera banner at the top, the individual Vera modules loading. The only difference to the RTL simulations are the following two lines:
>> Setting TEST_c405ScanEnable = 1 <<and
>> Setting TEST_c405ScanEnable = 0 <<
which are found after the last Vera module loads at the start of the simulation. This is the effect of the GATE_SIM define added earlier and is showing the activation of the scan chains to zero the flops internally.
And finally you should be greeted with a report in coreConsultant that looks like this:
There, you have now formally and functionally verified your very own PowerPC™ 405-S netlist. All it needs is to be placed and routed (and finally functionally and formally verified again) and you are good to go!
Published with permission from IBM and Synopsys
Friday, January 15, 2010
A word from a Lecturer on the use Wikipedia to students everywhere
The use of Wikipedia as a reference is a depressingly common crime committed by students but it is spreading to professional areas as well.
Wikipedia is an excellent introduction to a topic, but never a reference source. I often look up things on Wikipedia but I never use it as a reference or for something important without confirming it by some real, named, published and reviewed document.
The best summation of the structural issues of Wikipedia come from the Sheldon cartoon by Dave Kellett (with the authors permission, http://www.sheldoncomics.com/archive/071213.html):
(... and outside academia, God help you if your boss has found you have bet the company or a product on data gathered from Wikipedia that you haven't confirmed via other research.)
Real life works on named sources that are reliable, not ephemera.
Wikipedia is an excellent introduction to a topic, but never a reference source. I often look up things on Wikipedia but I never use it as a reference or for something important without confirming it by some real, named, published and reviewed document.
The best summation of the structural issues of Wikipedia come from the Sheldon cartoon by Dave Kellett (with the authors permission, http://www.sheldoncomics.com/archive/071213.html):
(... and outside academia, God help you if your boss has found you have bet the company or a product on data gathered from Wikipedia that you haven't confirmed via other research.)
Real life works on named sources that are reliable, not ephemera.
Subscribe to:
Posts (Atom)
Laird Tpcm 7250 is as good as Honeywell PTM7950 as thermal paste / interface for PC
[This is not very scientific, however it is notable. At 7.5W/m-K vs the installed SYY-157 at 15.7 W/m-K it performed better in real world lo...
-
[WARNING: Some people are reporting that following the steps for them does not fix the problem. I am working on trying to find out what the ...
-
[Credit for the fix goes to Todd Wild] Just had a weird problem with my Weller WSL with WMP pencil: Symptom of initial failure: The bas...
-
[This is not very scientific, however it is notable. At 7.5W/m-K vs the installed SYY-157 at 15.7 W/m-K it performed better in real world lo...