A Simple Dose-Response Model
This vignette describes how to set up a simple non-linear regression model in order to find a Bayesian D-optimal design for estimating the model parameter.
Model
Suppose we want to investigate the dose-response relationship between a drug and some clinical outcome. A commonly used model for this task is the sigmoid Emax model: an s-shaped curve with four parameters. Assuming independent measurement errors, the corresponding regression model for a total of $\SampleSize$ observations at $\NumDesignPoints$ different dose levels $\Covariate_1,\dots,\Covariate_{\IndexDesignPoint}$ is
\[\Unit_{\IndexUnit} \mid \Parameter \overset{\mathrm{iid}}{\sim} \mathrm{Normal}(\MeanFunction(\Covariate_{\IndexDesignPoint}, \Parameter), \sigma^2) \quad \text{for all } \IndexUnit \in I_{\IndexDesignPoint}, \IndexDesignPoint = 1, \dots, \NumDesignPoints,\]
with expected response at dose $\Covariate$
\[\MeanFunction(\Covariate, \Parameter) = E_0 + E_{\max{}} \frac{\Covariate^h}{\mathrm{ED}_{50}^h + \Covariate^h}\]
and a four-element parameter vector $\Parameter = (E_0, E_{\max{}}, \mathrm{ED}_{50}, h)$. Here, $E_0$ is the baseline response, $E_{\max{}}$ is the maximum effect above (or below) baseline, $\mathrm{ED}_{50} > 0$ is the dose at which half of the maximum effect is attained, and $h > 0$ controls the steepness of the increasing (or decreasing) part of the curve.
using Plots
θ = (e0 = 1, emax = 2, ed50 = 0.4, h = 5)
pse = plot(
x -> θ.e0 + θ.emax * x^θ.h / (θ.ed50^θ.h + x^θ.h);
xlims = (0, 1),
xguide = "dose",
yguide = "response",
label = "μ(dose, θ)",
)
In order to find an optimal design, we need to know the Jacobian matrix of $\MeanFunction(\Covariate, \Parameter)$ with respect to $\Parameter$. Calculating manually, or using a computer algebra system such as Maxima or WolframAlpha, we find
\[\begin{aligned} \partial_{E_0} \MeanFunction(\Covariate, \Parameter) &= 1 \\ \partial_{E_{\max{}}} \MeanFunction(\Covariate, \Parameter) &= \frac{\Covariate^h}{\mathrm{ED}_{50}^h + \Covariate^h}\\ \partial_{\mathrm{ED}_{50}} \MeanFunction(\Covariate, \Parameter) &= \frac{h E_{\max{}} \Covariate^h\mathrm{ED}_{50}^{h-1}}{(\mathrm{ED}_{50}^h + \Covariate^h)^2} \\ \partial_{h} \MeanFunction(\Covariate, \Parameter) &= \frac{E_{\max{}} \Covariate^h \mathrm{ED}_{50}^h (\log(\Covariate / \mathrm{ED}_{50}))}{(\mathrm{ED}_{50}^h + \Covariate^h)^2} \\ \end{aligned}\]
for $\Covariate \neq 0$, and the limit
\[\lim_{\Covariate\to0} \partial_{h} \MeanFunction(\Covariate, \Parameter) = 0.\]
Setup
For specifying a model and a design problem, Kirstine.jl makes use of Julia's extendable type system and method dispatching. If your working knowledge of this is rusty, now would be a good time to refresh it.
Model and Design Region
In Kirstine.jl
, the regression model is defined by subtyping Kirstine.NonlinearRegression
, Kirstine.Covariate
, Kirstine.Parameter
, and Kirstine.CovariateParameterization
, and implementing a handful of methods for them.
In the sigmoid Emax model, a single unit of observation is just a real number $\Unit_{\IndexUnit}$. For such a case, we can use the helper macro @simple_model
to declare a model type named SigEmaxModel
, and a covariate type named SigEmaxCovariate
with the single field dose
.
using Kirstine
@simple_model SigEmax dose
The model parameter $\Parameter$ is just a vector with no additional structure, which is why we can use the helper macro @simple_parameter
to define a parameter type SigEmaxParameter
with the Float64
fields e0
, emax
, ed50
, and h
.
@simple_parameter SigEmax e0 emax ed50 h
(The @simple_model
and @simple_parameter
macros also set up methods for a few helper functions that dispatch on these newly defined types. The details are in the macros' docstrings and the api docs. They are not relevant for this tutorial, but you should look them up when you need to implement a more complex model, like the one in the dose-dime-response example.)
Next we implement the Jacobian matrix. We do this by defining a method for Kirstine.jacobianmatrix!
that specializes on our newly defined model, covariate and parameter types. This method will later be called from the package internals with a pre-allocated matrix jm
(of size (1, 4)
). Now our job is to fill in the correct values:
function Kirstine.jacobianmatrix!(
jm,
m::SigEmaxModel,
c::SigEmaxCovariate,
p::SigEmaxParameter,
)
dose_pow_h = c.dose^p.h
ed50_pow_h = p.ed50^p.h
A = dose_pow_h / (dose_pow_h + ed50_pow_h)
B = ed50_pow_h * p.emax / (dose_pow_h + ed50_pow_h)
jm[1, 1] = 1.0
jm[1, 2] = A
jm[1, 3] = -A * B * p.h / p.ed50
jm[1, 4] = c.dose == 0 ? 0.0 : A * B * log(c.dose / p.ed50)
return jm
end
Note that for defining the design problem we do not need to implement the mean function $\MeanFunction$ itself, but only its Jacobian matrix $\TotalDiff\MeanFunction$ with respect to $\Parameter$.
The jacobianmatrix!
function is the main place for achieving efficiency gains. Note how we have re-arranged the derivatives such that we compute the expensive exponentials only once. (We also follow the conventions and also return the modified argument jm
.)
Next, we set up the design region. Let's say we want to allow doses between 0
and 1
(in some unit).
dr = DesignInterval(:dose => (0, 1))
A design will then be a discrete probability measure with atoms (design points) from dr
, and a design point is represented by a Vector{Float64}
. In our simple model, these vectors have length 1
.
Finally, we need to specify how a design point maps to a SigEmaxCovariate
. Here, the design region is simply the interval of possible doses. This means that we can just copy the only element of the design point into the covariate's dose
field. For this we can use JustCopy
, which is a subtype of Kirstine.CovariateParameterization
.
cp = JustCopy(:dose)
Prior Knowledge
For Bayesian optimal design of experiments we need to specify a prior distribution on $\Parameter$. Kirstine.jl
then needs a sample from this distribution.
For this example we use independent normal priors on the elements of $\Parameter$. We first draw a vector of SigEmaxParameter
values which we then wrap into a PriorSample
.
using Random
Random.seed!(31415)
theta_mean = [1, 2, 0.4, 5]
theta_sd = [0.5, 0.5, 0.05, 0.5]
sep_draws = map(1:1000) do i
rnd = theta_mean .+ theta_sd .* randn(4)
return SigEmaxParameter(e0 = rnd[1], emax = rnd[2], ed50 = rnd[3], h = rnd[4])
end
prior_sample = PriorSample(sep_draws)
Note that the SigEmaxParameter
constructor takes keyword arguments.
Design Problem
Now we collect all the parts in a DesignProblem
.
dp = DesignProblem(
criterion = DCriterion(),
region = dr,
model = SigEmaxModel(sigma = 1),
covariate_parameterization = cp,
prior_knowledge = prior_sample,
)
Note that the SigEmaxModel
constructor takes the measurement standard deviation $σ$ as an argument. For D-optimality, this only scales the objective function and has no influence on the optimal design. This is why we simply set it to 1
here.
Optimal Design
Kirstine.jl
provides several high-level Kirstine.ProblemSolvingStrategy
types for solving the design problem dp
. A very simple, but often quite effective one, is direct maximization with a particle swarm optimizer.
Apart from the Pso
parameters, the DirectMaximization
strategy needs to know how many points the design measures should have. This is indirectly specified by the prototype
argument: it takes a DesignMeasure
which is then used for initializing the swarm. One particle is set exactly to prototype
. The remaining ones have the same number of design points, but their weights and design points are completely randomized. For our example, we use a prototype
with 8
equally spaced design points.
strategy = DirectMaximization(
optimizer = Pso(iterations = 50, swarmsize = 100),
prototype = equidistant_design(dr, 8),
)
Random.seed!(54321)
s1, r1 = solve(dp, strategy)
The solve
function returns two objects:
s1
: a slightly post-processedDesignMeasure
r1
: aKirstine.DirectMaximizationResult
that contains diagnostic information about the optimization run.
We can access the lower-level Kirstine.OptimizationResult
by calling optimization_result
on r1
, which will show some statistics when printed:
optimization_result(r1)
OptimizationResult
* iterations: 50
* objective evaluations: 5000
* gradient evaluations: 0
* elapsed time: 0:00:09.127
* traced states: 0
* state type: Kirstine.PsoState{DesignMeasure, Kirstine.SignedMeasure}
* maximum: -6.7206347777662705
* maximizer: DesignMeasure(
[0.0] => 0.13404889547477042,
[0.28680117509990044] => 0.1385323232912528,
[0.35590085349128153] => 0.024903749604217627,
[0.3567666844285769] => 0.12742156052917672,
[0.49982025315652223] => 0.22264826193961815,
[0.010691842729708925] => 0.1077006446024656,
[1.0] => 0.1259454888888293,
[1.0] => 0.11879907566966932,
)
The solution s1
is displayed as designpoint => weight
pairs.
s1
DesignMeasure(
[0.0] => 0.13404889547477042,
[0.010691842729708925] => 0.1077006446024656,
[0.28680117509990044] => 0.1385323232912528,
[0.35590085349128153] => 0.024903749604217627,
[0.3567666844285769] => 0.12742156052917672,
[0.49982025315652223] => 0.22264826193961815,
[1.0] => 0.24474456455849863,
)
Looking closely at s1
, we notice that two design points are nearly identical:
[0.35590085349128153] => 0.024903749604217627
[0.3567666844285769] => 0.12742156052917672
It seems plausible that they would merge to a single one if we ran the optimizer for some more iterations. But we can also do this after the fact by calling simplify
on the solution. This way we merge all design points that are less than some maximum distance apart, and remove all design points with negligible weight.
s2 = sort_points(simplify(s1, dp, maxweight = 1e-3, maxdist = 2e-2))
DesignMeasure(
[0.004763270091889071] => 0.24174954007723604,
[0.28680117509990044] => 0.1385323232912528,
[0.3566251292474773] => 0.15232531013339434,
[0.49982025315652223] => 0.22264826193961815,
[1.0] => 0.24474456455849863,
)
Because this issue occurs frequently we can directly pass the simplification arguments to solve
:
Random.seed!(54321)
s2, r2 = solve(dp, strategy; maxweight = 1e-3, maxdist = 2e-2)
The original, unsimplified solution can still be accessed at solution(r2)
.
In order to confirm that we have actually found the solution we visually check that the Gateaux derivative is non-positive over the whole design region. In addition, the design points of the solution are indicated together with their weights. (The plot_gateauxderivative
function has a keyword argument to configure the point labels.)
gd = plot_gateauxderivative(s2, dp)
To visualize where the design points end up on the prior expected mean function, we can use plot_expected_function
.
mu(dose, p) = p.e0 + p.emax * dose^p.h / (p.ed50^p.h + dose^p.h)
ef = plot_expected_function(
(x, c, p) -> mu(x, p), # response for fixed covariate (ignored here) and parameter
c -> [c.dose], # x-coordinate(s) for a point
(c, p) -> [mu(c.dose, p)], # y-coordinate(s) for a point
0:0.01:1, # x-axis grid
s2,
model(dp),
covariate_parameterization(dp),
prior_knowledge(dp);
xguide = "dose",
yguide = "response",
bands = true, # prior credibility bands, by default 80%
)
Finally, we can also look at a plot of the optimization progress and see that the particle swarm has converged already after about 20
iterations.
pr1 = plot(r1)
Relative Efficiency and Apportionment
In order to be able to estimate $\Parameter$ at all, we need a design with at least 4
points. Let's compare how much better the optimal solution s2
is than a 4-point equidistant_design
.
efficiency(equidistant_design(dr, 4), s2, dp)
0.7269868089969024
Their relative D-efficiency
means that s2
on average only needs 0.72
as many observations as the equidistant design with 4
points in order to achieve the same estimation accuracy.
Suppose we now actually want to run the experiment on $\SampleSize=42$ units. For this we need to convert the weights of s1
to integers that add up to $\SampleSize$. This is achieved with the apportion
function:
app = apportion(s2, 42)
5-element Vector{Int64}:
10
6
7
9
10
This tells us to take 10
measurements at the first design point, 6
at the second, 7
at the third, and so on.