Saturday, March 14, 2009

CCD, My read of Certess technology and positioning

 

With due respect to the technology behind Certess’s tool I have some discomfort with the way it is being positioned – atleast in the below article:

http://www.edadesignline.com/howto/215600203;jsessionid=TP12OA3IF1X3UQSNDLOSKHSCJUNN2JVN?pgno=2

Before I talk about my discomfort, let me state the positives: Not very often do we get to read such well written, all encompassing technical article, Kudo’s to Mark Hampton – he touches on every aspect of functional verification in this article, not so common in an EDA product “promotional” article – to which this article may be characterized to (unfortunately IMHO). Having said that, I personally believe Certess should position the technology “along with” existing ones than challenging/trying to replace time tested/well adopted methodologies such as code cov, functional cov etc. Not that I differ from his views on the shortcomings of these technologies, rather going by what Pradip Thakcker said in DVM 08 (http://vlsi-india.org/vsi/activities/2008/dvm-blr-apr08/program.html)

“Code coverage and functional coverage are useful techniques with their own strengths and weaknesses. Rather than worrying about their weaknesses, focus on the positives and use them today”..Pradip, during his “Holistic Verification: Myth or The Magic Bullet?”

I will be very glad if Certess focuses on their real strength of exposing lack of checkers in a Verification environment than trying to “eat” into the well established market of Code/Func coverage tools. Another rationale: Both the cov and qualification is compute intensive and given the amount of EDA investment that has gone into stabilizing and optimizing these features, it will be irrational to try and replace them with “functional qualification” (No offense meant, I have great respect for Mark – given his excellent article and ofcourse the product). With SpringSoft acquiring Certess hopefully their customer base/reach increases and that will throw up more success stories in the coming months/quarters. So good times ahead!

3 comments:

Mark Hampton said...

Hi Srinivasan,

Thanks for you kind remarks and thanks even more for your criticism ;)

You are right that functional qualification can be complementary to both code coverage and functional coverage.

pragma marketing_on Typically when we first engage with a new client this is exactly how Certitude is used. By introducing functional qualification the user gets additional information, they don't need to change their methodology.pragma marketing_off

That additional information is regarding two different types of issues:

1) Non-detected faults i.e. faults that are activated by the input vectors and propagate to an observable location in the design (e.g. top level outputs) but the checking is missing or not working correctly. It is impossible to find these types of problems with code coverage or functional coverage.
2) Non-propagated faults i.e. faults that are activated (covered) by the input vectors but do not propagate to an observable location. This means that some path is not being exercised by the input vectors, so bugs could go unseen. Code coverage does not consider paths from inputs to checkers so it is impossible to find these types of problems with code coverage. Functional coverage could in theory be added for the related path, but in reality the number of paths and the complex temporal relations mean it is impossible with limited resources to write all the necessary functional coverage code (and the functional coverage code is also likely to contain errors). So for this class of problem I would also say functional qualification is complementary.

It is often very surprising just how poor the propagation scores are for verification environments with 100% (analyzed) code and functional coverage achieved. In such a situation, it would not be surprising (for me) to find 30-40% of the faults not being propagated when a team uses functional qualification for the first time. This is normal because they have tended to think in terms of coverage points not paths form inputs to checkers - just changing this perception can make a big improvement for the next project.

When running Certitude the product needs to run each testcase once (i.e. run a full regression) to collect information about the behavior of the testcases in the context of the design. This is the collection of the "activation" information. This is similar to code coverage information, so clients often stop using code coverage and use the Non-activated fault information to identify typical code coverage problems. In this sense functional qualification is not complementary but provides a superset of information that includes the code coverage. While it would be possible to run a 2nd regression to collect the code coverage results this consumes unnecessary simulation and results in the user having to manage additional data. But at the beginning the user may decide to keep using code coverage - and there is no problem to do this.

Regarding functional coverage I think this is a most interesting debate. It seems functional coverage was initially developed to manage the pseudo-random nature of HVL verification. It is needed to be able to rapidly interpret what the pseudo-randomization is doing. As the technology matured functional coverage has taken a central role in many verification flows, being considered as a formalization of the test plan.

The industry (with a large marketing campaign from EDA companies) moved to a "coverage driven" verification flow. I think this is being recognized as a mistake by many leading (bleading?) edge teams. There is another movement toward "test plan driven" verification where feedback on the verification quality should result in improvements to the test plan (root cause analysis). This opens some exciting opportunities for really improving the verification process and reducing the cost. But it means getting away from a coverage centric perspective.

We are certainly not encouraging the industry to move toward a "functional qualification driven" verification flow. But functional qualification is the first direct measurement of the bug detecting ability of verification and offers an opportunity to take a new look at how we drive verification - lets make the test planning central.

In conclusion (sorry this is so long) there is perhaps a lot to be gained by questioning the current assumptions regarding coverage (both code and functional) and seeing if functional qualification can bring benefits beyond just a complementary source of information. I think functional qualification is a fundamental change in how we learn to improve in verification and can professionalize the verification discipline - which would be a good thing for all of us.

But in any case I hope some debate can prove valuable - so thanks for your input.

Cheers,
Mark

Srinivasan Venkataramanan said...

Hi Mark,
Sorry for the delay in response. First of all THANKS for taking time to respond with details about this new technology. I believe we need more of these "broadcasts/podcasts" to get industry listen to alternate means of achieving quality. So more-the-merrier (even if it repeats in few places). I have few specific comments/clarification. Am trying to understand your posts in parts, so trying a poor-man’s threaded blog-commenter (:-



>> 1) Non-detected faults i.e. faults that are activated by the input vectors and propagate to

>> an observable location in the design (e.g. top level outputs) but the

>> checking is missing or not working correctly. It is impossible to find these types of

>> problems with code coverage or functional coverage.

IMHO coverage is *not* intended for this at the first place. I fully agree this is a SERIOUS issue and infact this is precisely what we had seen in our chips earlier (I briefed about few scenarios in my post). This is simply a flaw in the verification plan and maybe in the test planning phase to have missed out to specify “what and how to check under these conditions”. Not an easy thing to solve, sure. I can sort of imagine the value of Certitude here, but to me a BIG question how many such faults can a tool realistically induce and qualify. I’m no DFT expert but trying to achieve 100% on every possible mutation combination doesn’t sound feasible for multi-million ASICs to me, please correct me if I’m wrong. Or maybe this tool is better for Block levels?



And then there are additional class of problems such as “functionally OK”, performance wise not OK – I don’t believe Certitude can do much there – if we miss some perf checkers, we are on our own…

>> 2) Non-propagated faults

This is certainly a worrying and interesting problem indeed. The promoters of ABV have been talking about this for long. It will be interesting to see if Certitude can throw out SVA/PSL code in select cases (a la old time fishtail-da.com – BTW, does anyone have any update on that technology? – will start a separate thread anyway, sorry for the rambling).

But taking a step back – your view/expectation of coverage seems slightly different than mine. I see it as a “measure” of activation than detection of errors/bugs. The “activation” can be moved a level above (abstraction) via functional coverage – if used sensibly. So I don’t see why “functional qualification” should bash coverage all the time :-) BTW – in case you haven’t figured it out yet, I’m a fan (BIG?) of coverage for its own strengths.

>>….This is normal because they have tended to think in terms of coverage points not

>> paths form inputs to checkers - just changing this perception can make a big improvement

>> for the next project.

I see your point, thanks for enlightening us with this new view/mindset.

>> When running Certitude the product needs to run each testcase once (i.e. run a full regression)

>> to collect information about the behavior of the testcases in the context of the design.

>> This is the collection of the "activation" information

While you promote to replace code-cov on the way, maybe – is there a way to get this info from available code cov DB? That would save us (the whole industry I mean) to leverage on all the optimizations done so far in the code cov collection process. Please note – I’m not an EDA developer so I didn’t check the feasibility of this really, but that sounds reasonable for me from a user perspective.

>> "coverage driven" verification flow. I think this is being recognized as a mistake by many leading

I beg to differ on this, unless ofcourse I see a whole industry certifying that!

>> But functional qualification is the first direct measurement of the bug detecting ability

>> of verification

Verification environment/assertions?



>> and offers an opportunity to take a new look at how we drive verification - lets make the test planning central.

And maybe to complete the “full list of checkers” identification. The next worry would be to code all those checkers – scope for a new EDA start-up (am day dreaming) :-)

>> In conclusion (sorry this is so long)

I guess I need to repeat that “sorry” :-)

I would like to understand more on how many faults can Certitude kind of tools detect in a full ASIC of modern day.

Warm Regards

Srinivasan

Mark Hampton said...

Hi Srinivasen,

I'll continue with the poor-man’s threaded blog-commenter :)

I think we both agree to agree that measuring the checkers is valuable.

>>> I can sort of imagine the value of Certitude here, but to me a BIG question how many such faults can a tool realistically induce and qualify.

One of the original things Certitude does is order the faults. The ordering means the tool typically only qualifies a very small fraction of the faults before finding verification weaknesses.

Regarding the design size, the standard technique is applicable to any block (or ASIC) where you are trying to verify the functionality of every statement. This is typically not the case in full SoC. When simulating a full chip is is often integration testing that is performed, in this case Certitude can insert connectivity faults at specific levels of the design hierarchy.


>>> And then there are additional class of problems such as “functionally OK”, performance wise not OK – I don’t believe Certitude can do much there – if we miss some perf checkers, we are on our own…

Interestingly Certitude can point out some performance issues. When faults are inserted in functionality that impacts performance, Certitude can highlight that performance is not being checked.

>> 2) Non-propagated faults

>>> This is certainly a worrying and interesting problem indeed. The promoters of ABV have been talking about this for long. It will be interesting to see if Certitude can throw out SVA/PSL code in select cases

Certitude does not produce assertions. It only measures the verification of the design.

>>> But taking a step back – your view/expectation of coverage seems slightly different than mine. I see it as a “measure” of activation than detection of errors/bugs.

I think we are on the same page. Coverage is measuring activation. However, if asked how they measure verification, a lot of engineers would say they use coverage. So I am trying to highlight this misunderstanding. Coverage measures activation, not verification.

>>> The “activation” can be moved a level above (abstraction) via functional coverage – if used sensibly. So I don’t see why “functional qualification” should bash coverage all the time :-)

I am going to get a bad reputation ;) Functional coverage is very useful for understanding the behavior of pseudo-random testcases. For example it can help debug testcase scenarios.

>> When running Certitude the product needs to run each testcase once (i.e. run a full regression) to collect information about the behavior of the testcases in the context of the design. This is the collection of the "activation" information

>>> While you promote to replace code-cov on the way,

Sorry, I am not being clear. I am not promoting that you replace coverage. Just start using Certitude and decide if you still want to invest the same effort in coverage :)

>>> maybe – is there a way to get this info from available code cov DB?

The information Certitude collects is specific to the nature of the faults.

>>> That would save us (the whole industry I mean) to leverage on all the optimizations done so far in the code cov collection process.

The simulators tend to be quite good at collecting block and branch coverage but if you turn on all the coverage features performance dives. Certitude is more efficient because it collects only the information it needs (which is more than block and branch).

>> "coverage driven" verification flow. I think this is being recognized as a mistake by many leading

>>> I beg to differ on this, unless of course I see a whole industry certifying that!

Well, the day the whole industry is doing something the leading teams will probably have moved on to a different approach ;) I'm not saying there is a majority of people who think "coverage driven" should be replaced with "test plan driven". But I do think this is an emerging trend, I guess we will know in a few years...

>>> I would like to understand more on how many faults can Certitude kind of tools detect in a full ASIC of modern day.

This is a bit like asking how long is a piece of string ;) The goal is not to drive the process by qualifying all the faults. This would take unreasonably long. By using the feedback from Certitude to do a "root cause analysis" you can efficiently make significant improvements to the verification. The fault ordering done automatically by Certitude is very important for maximizing the ROI.

Coverage metrics have tended to lead engineers (and managers) to think in terms of "100%" But in reality verification is a risk management activity, Certitude highlights this. So rather than aiming for 100% the goal becomes to continually improve the project scores over time (like in any other quality managed process).

Regards,
Mark