Potencyassay.com | A blog about bioassays, immunoassays, and other potency assays

Jun/10

9

Introduction to parallelism testing in potency assays

In a previous blog post I stated that a potency value doesn’t mean anything unless the shape of the curves of the reference standard and the unknown are exactly the same: 

screen

This condition is known as parallelism (or more correctly mathematical similarity). While that may sound completely logical and simple, the real world (as always) is more complicated.  When we estimate potency we have to rely on a statistical model of the underlying data.   The four parameter logistic model is a common choice in potency assays.  This model is fitted to our data using a regression algorithm of some sort. 

The raw data gathered in an assay and then used in the regression is a sampling of all of the possible data points at each dilution.  Because of this limitation, the model itself is only an estimate of the "true" underlying curves of our experimental system or assay. 

The consequence of this reality is that we have two estimated curves and we are trying to use them to tell if the underlying "true" data (which we don’t know) is really parallel.  That’s not an easy thing to do with certainty.  For example, are either of these pairs of curves parallel?  How do we know for sure?

EN1_screen

There are many different approaches available to answer this question.   The simplest method is just to look at the curves.  If you are doing investigational work and you’re fairly familiar with your assay, this may be all you need.  However, in a more regulated environment you will probably need something a little less subjective.  In general, there are two philosophies on how to measure parallelism using statistical methods:  difference testing and equivalence testing.  Let’s discuss each of these in more detail.

Difference testing

Difference testing relies on the creation of a metric for the measure of parallelism.  In theory, such a metric should scale with the degree of non-parallelism.  In other words, the less parallel the curves are, the larger the metric.

Let’s walk through how these metrics are derived.  In potency testing we fit two models to derive parallelism data, the full model and the reduced model.

In the full model, we fit independent parameters for our reference and sample curves.  If we use a four parameter logistic model to estimate the best fit for the upper and lower asymptotes, the slope, and the EC50 parameters for each curve independently, we have a total of eight different parameters.  This model is illustrated in the first graph below.  Notice how the curves pair have different shapes. In this graph the two curves clearly have different upper asymptotes and are therefore not parallel.  Also notice how the two curves fit their own underlying data fairly well.  This is the "full" model:

image

In the reduced model, we only allow there to be one common set of upper and lower asymptotes and slope.  Only the EC50 parameter is allowed to be unique to each curve.  This situation is illustrated in the graph below.  It is this model we use to estimate potency.  Notice how the two curves have the same shape, but they don’t fit the data as well:

EN1_image

We can use these two graphs to generate a metric for parallelism.  The first thing we can do is to calculate what’s called the residual or error for each data-point.  The residual is simply the distance from the data point to the curve:

EN2_image

If we compare the two graphs of the full and reduced model above, it becomes obvious that the residuals in the full model are always going to be smaller or equal to the residuals in the reduced model.  We can use this information to generate a metric. 

First, let’s square all of the residuals (to equalize positive and negative residuals) and sum those squares.  This number is known as the sum of squared errors (SSE).   If we do this for both curves, we have two sets of SSE, one for the full model and one for the reduced model.  Like I mentioned above, the SSE for the full model is always smaller than or equal to the reduced model.

One common use of these metrics is to use an F-test for parallelism.  The following formula is used to calculate an F-statistic:

EN2_screen

As its name indicates, this statistic is distributed according to the F-distribution.  We can therefore use this statistic to set up a hypothesis test of parallelism.  The null hypothesis is that the curves are not different (notice that I didn’t say that they are the same), and the alternate hypothesis is that they are different.  We then generate a p-value with a cutoff that help us decide if we should reject the null hypothesis (usually <0.05).  We would then say that the curves are different and therefore not parallel.

However, as many different authors before me have noted, there’s a weakness to this approach that’s hard to overcome.  In the equation above, the SSE for the full model appears in the denominator.  So what happens if you have a very precise assay and the full model has a very low SSE?  You are then dividing by a very small number and the F statistic gets very large.  This situation can lead to false positives for lack of parallelism.  In effect you are punished for having a very precise assay that follows the model very closely.  The differences between the curves may be small, but your good assay was able to detect it.  In a highly variable assay the opposite occurs, you will accept many more assays just because you don’t have the precision to tell if they are parallel or not.

This situation has been remedied by the use of a chi-square statistic.  The formula for calculating it is as follows:

EN3_screen

Again, this statistic follows the distribution its named after.  The same strategy we employed above can be used to set up a hypothesis test for parallelism using the chi-square metric.  Since this metric doesn’t rely on dividing by the SSE of the full model, it doesn’t suffer from the same issues with assay precision that the F stat does.

Unfortunately, there are still some potential problems with using this approach.  First the regression has to be perfectly weighted in order for this stat to be perfectly chi-square distributed.  Perfect weighing is difficult to achieve.

But beyond the weighting issue, there is also a philosophical problem with this approach.  These types of tests are measuring whether the shapes of the curves are different, but what we need to know for potency is whether they are actually the same.  Not being different is not the same as saying they are equivalent.  We may simply not have good enough information to tell that they are different.

Equivalence testing

Parallelism testing for potency assays has recently switched to focus on testing for curve equivalence rather than difference.  How does this work?  This approach requires us to set a limit on a specific assay parameter that we are willing to accept. 

For example, we can say that as long as the ratio of the slopes from two assays is between 0.8 and 1.25 we will accept the assay.  We can then fit the two curves independently (full model) and calculate a confidence interval on the slope ratio.  If the confidence interval on this metric is contained within the two limits, we say that the curves are equivalent based on our criteria. 

This type of test has two consequences.  First, we can say that the curves are actually equivalent instead of "not different".  Second, we are no longer punished if the assay is “too” precise, since all that will do is make our confidence interval shorter.   Let’s see what this looks like graphically:

EN4_screen

As you can see, this type of test makes intuitive sense since we can set our limits based on our knowledge of the assay system without a large data set for determining statistically derived limits.  It also prevents false positives.  In difference testing you have to accept that you will reject some runs based on chance alone what were truly parallel .  This is less likely in equivalence testing since you’re not doing a hypothesis test.

So why not use equivalence testing for everything?  In a simple, linear assay, I would encourage this approach since it’s easy to calculate the confidence intervals for each parameter in the regression.  However, in a non-linear regression the confidence intervals for the equation parameters can not be solved independently and the joint confidence regions have very complex shapes and in some cases extend to infinity. 

So for now, we are stuck with difference testing for more complex models.  I’ve recently heard about some interesting work being done that may solve this problem, but I’m sworn to secrecy…  As soon as this work is completed and published in a public forum, I will discuss it here on the blog.

I hope this has been a simple to understand introduction to parallelism testing.  If you want to read a little more about these topics, here are two journal articles I recommend to get you started:

http://www.ncbi.nlm.nih.gov/pubmed/15971545
PDA J Pharm Sci Technol.  2005 Mar-Apr;59(2):127-37.
Assessing parallelism prior to determining relative potency.
Hauck WW , Capen RC , Callahan JD , De Muth JE , Hsu H , Lansky D , Sajjadi NC , Seaver SS , Singer RR , Weisman D .

http://www.ncbi.nlm.nih.gov/pubmed/15920890
J Biopharm Stat.  2005;15(3):437-63.
Measuring parallelism, linearity, and relative potency in bioassay and immunoassay data.
Gottschalk PG , Dunn JR .

 

As always, thanks for reading!

Dan

RSS Feed

11 Comments for Introduction to parallelism testing in potency assays

Harish | June 9, 2010 at 11:54 pm

Hi Dan,

I have started liking your Blog :) Dan I have a question for you regarding the relative potency determinations in one of our assay.

Normally in SoftMax Pro we get the individual values determined for a 4PL equation namely Lower aymptote, Upper asympote, slope and EC50. The relative potency is determined based on EC50 for standard and test sample. The issue that I am having with our assay is that it is passing the SLOPE criteria specifications but the EC50 values are all over the place varying between 0.6 to 8.2. Saying that my relative potencies always pass and is very accurate too.
I needed your help in suggesting me whether this variation is acceptable or not? As we have to use this assay for regulatory submissions I needed some expert opinion as my previous interactions with CONSULTANTS did not yield any information..

Awaiting your response
Harish

Author comment by Dan | June 10, 2010 at 8:52 am

Hi Harish,

Thanks for the kind words. I’m glad you are getting some value from the blog.

It’s hard for me to say whether this variation in your assay is acceptable or not without seeing data and knowing more about your assay.

One of the main reasons for running relative potency assays instead of just using absolute EC50 ratios is exactly to control this kind of variability. Since you say your relative potencies are very accurate, it sounds like you have some plate to plate variation and the intra-plate variation is quite small.

In general, my approach is to set acceptance criteria on curve fit (usually based on residual variance or similar). This is to ensure that the data is actually fitting the model. I will also set stringent criteria on parallelism so that I have confidence in my potency number.

For the other assay parameters, I usually just trend them over time to make sure there are no changes in the assay. Especially so early on in assay development. I can then use this experience during assay validation and transfer.

So my advice to you would be to track the EC50 parameter and see how much it varies. If you think the variation is too high, you need to do some experiments to figure out if you can control the variation a little better. Using DOE is always a good idea so that you don’t miss interactions.

Performing a variance component analysis is also very important so that you can characterize the sources of variability in your system.

I hope that was somewhat helpful,

Dan

RRI | September 7, 2010 at 9:00 am

Hello Dan,
I’ve a question and hope you can help me.
For potency calcualtions we analyse our data in GraphPad Prism (4-parameter fit). First we look at the full model.
If the Top, Bottom and HillSlope of the Sample curve doesn’t deviates more dan 15% (plus and minus) from the Top, Bottom and HillSlope values of the reference, we constrain these parameters with “Shared value for both data sets” to generate a reduced model with parallel graphs with unique EC50 values.
In your full model example the top of the red curve (let’s call it the reference) is 220000 and from the black curve (sample) 190000. So, the 15% acceptance criteria of the top of the ref is [187000 - 253000]. The top os the sample meets these specifications, so it is allowed to conatrain the top vualues. But when is contraining not longer allowed? If the top of the black line was 100000 for example, is it then still allowed to constain the tops of both curves. I can imagine there is a limit, but what are the official rules? I use the 15%-rule, but is that legal/correct?

Hope to hear from you soon.

Author comment by Dan | September 9, 2010 at 12:47 pm

Dear RRI,

Thanks for your comment.

The short answer to your question is “it depends…” Unfortunately, with potency assays there are no established rules for setting cutoffs on acceptance criteria.

The approach you are using is similar to the equivalence approach I discuss in the post above. However, the appropriate way to do it is to use the confidence interval on the ratio instead of just the mean. That’s because the assay may be highly variable in the asymptote estimate, but the mean just happens to fall between your limits. In my example, a 15% difference would clearly not be appropriate since the assay variability is very low. The curves are clearly different.

I personally don’t like the approach of using limits that are based on % differences, because we rarely have any information of what is actually meaningful (which is the heart of your question). In an ideal world, if you had two lots that differ by 15% and both of them were safe and efficacious in the clinic, you can they say that yes, the difference isn’t meaningful. Unfortunately, that kind of data is rarely available.

Instead, I generally use an assay capability approach. Basically, you determine what the expected variability of your parameter is and then set a cutoff at some predetermined limit (usually at 95 or 99%). Anything above this limit is probably a real difference and should be investigated. Of course this approach assumes that your assay is precise enough to find differences that are significant in the real world. What this difference is has to be decided based on the needs and biology of each individual product. Ideally, the potency assay should not be used in isolation, but can be used as a triggering assay to investigate the product using tangential methods.

I’m sorry there’s no easy answer for you, but this is what makes potency assays interesting to work with!

Let me know if that makes sense…

Dan

RRI | September 10, 2010 at 8:31 am

Correct me if I’m wrong, but in general we can beter say:
Check the 95% bi of the hillslope and of the upper and lower asymptote (Top en Bottom) of the reference.
If the Hillslope, Top and Bottom of the sample falls within these 95% intervals, is is allowed to constrain these 3 parameters (Hillslope, T, B) to compare the EC50 valaue of the sample and the ref.
If, after constraining, the EC50 of the sample falls within the 95% bi of the EC50 of the reference, the sample is not significantly different from the ref.
Or, is it NOT neccesary to constrain to calculate the potency with the EC50 values???

Author comment by Dan | September 10, 2010 at 1:36 pm

You’re absolutely right in that you need to check the similarity of the curves before you calculate potency using the constrained model. The problem lies in finding the confidence intervals for all the parameters at the same time. It has been shown that sometimes calculating the joint confidence region for all of the parameters is not possible with current mathematical tools (unless you’re dealing with a linear model). That’s why other metrics, such as the chi-square metric are often used instead.

To answer your last question, yes, it is necessary to constrain the other parameters when calculating potency using the relative EC50 values.

I hope that helps, otherwise, keep posting comments!

Thanks!
Dan

zy | September 30, 2010 at 5:08 pm

Hi,Dan.

I am new to bioassay field. I am supposed to develop a cell based potency assay. Which factor is more important, R square or the signal to noise ratio for optimization? I am having trouble on the accuracy and precision. Any suggestions on how to improve accuracy and precision of a cell based assay? thanks!

XYZ | November 11, 2010 at 6:00 pm

Could you please explain relationship between relative potency and estimated potency for PLA assay?
Stated potency and estimated potency values and their limitations (according to confidence intervals) are given in EP monographs of active ingredients.Could you confirm below sentences? (Estimated potency is not less than 75% not more than 125% of stated potency-Ep monograph sentences). Doest it mean relative potency of the test product should be 0.75-1.25 in PLA ?

Steve | December 26, 2010 at 8:24 am

Very interesting! Merry Christmas and a happy new year!

Sheri | May 17, 2011 at 11:10 am

Hi Dan,
Thank you so much for sharing your knowledge of potency assays. I have been developing a potency assay for a vaccine for several years now and I always struggled with the statistical analyses (I could get the “answer” but I didn’t fully understand all of the models). Your explanation of parallel testing was fantastic! I finally understand the fundamentals and differences between equivalence testing and difference testing! It all seems so simple now – I guess I just needed your explanation. A simple “thank you” does not seem sufficient to express how much I appreciate what you are doing with this blog.
In all seriousness – Thank you! Sheri

Joey | October 11, 2011 at 3:14 am

Is there a software I can download for free from the internet to perform a Paralell line potency bioassay?

Leave a comment!

*

<<

>>

Find it!

Theme Design by devolux.org

Tag Cloud

Conferences