Bayesian vs Frequentist Power Functions to Determine the Optimal Sample Size: Testing One Sample Binomial Proportion Using Exact Methods Bayesian vs Frequentist Power Functions to Determine the Optimal Sample Size: Testing One Sample Binomial Proportion Using Exact Methods

In order to avoid the drawbacks of sample size determination procedures based on classical power analysis, it is possible to define analogous criteria based on ‘ hybrid classical-Bayesian ’ or ‘ fully Bayesian ’ approaches. We review these conditional and predictive procedures and provide an application, when the focus is on a binomial model and the analysis is performed through exact methods. The distinction between analysis and design prior distributions is essential for the practical implementation of the criteria: some guidelines for choosing these priors are discussed, and their impact on the required sample size is examined.


Introduction
The calculation of an adequate sample size is a crucial aspect in the design of experiments. Researchers need to select the appropriate number of participants required to ensure ethically and scientifically valid results. If samples are too large, time and resources are wasted, often for minimal gain. On the other hand, too small samples may lead to inaccurate results. Therefore, sample size determination (SSD) plays a very important role in the design aspect of studies in many fields, especially in the context of clinical trials where, in addition to economical problems, investigators have to deal with important ethical implications.
Sample size determination (SSD) methods, when the focus is on hypothesis testing, are typically related to the concept of power function. Let us denote the parameter of interest by θ and let us assume that we are interested in testing H 0 : θ ∈ Θ 0 versus H 1 : θ ∈ Θ 1 , where Θ 0 and Θ 1 form a partition of the parameter space Θ. The most widely used frequentist SSD criterion consists in choosing the minimal sample size that guarantees a given power, for a fixed type I error rate, under the assumption that θ is equal to a suitable design value, θ D ∈ Θ 1 . In practice, the idea is to ensure a sufficiently large probability of obtaining a statistically significant result (i.e. of rejecting the null hypothesis), when the true value of θ belongs to the alternative hypothesis and is equal to θ D . In many textbooks (see [1][2][3], among others) sample size formulas, derived using this procedure, are provided in many occurring situations, under different hypothesis testing and based on both categorical and quatitative data.
In the frequentist criterion described above, a crucial role is played by the design value that the trial is designed to detect with high probability, whose uncertainty is not accounted for. In fact, the local optimality is one of the most criticized aspects of the method. Moreover, this frequentist procedure does not allow to take into account pre-experimental information about θ, for instance available from previous studies. By adopting a 'hybrid classical-Bayesian approach' or a 'fully Bayesian approach', it is possible to define analogous criteria for sample size selection that allow the researcher to avoid the problem of the local optimality or/and to introduce possible prior information in the SSD process.
In this chapter, we illustrate how to construct frequentist and Bayesian power functions, based on both conditional and predictive approaches, and how to use them to determine the optimal sample size. An essential element of the method is the use of two different prior distributions for the parameter of interest, which play two distinct roles in the criteria. The importance of this distinction in sample size determination problems has been stressed by several authors (see, for instance, [4][5][6][7][8][9] among others). The rest of the chapter is organized as follows: in Section 2, we review both the frequentist conditional and predictive procedures based on power analysis to determine the optimal sample size. Section 3 provides a description of analogous methods based on Bayesian power functions. Then, in Section 4, we formalize different SSD criteria that depend on the shape of the power curves as a function of the sample size and, as a consequence, on the nature of the data distributions. Furthermore, in Section 5, we illustrate an application of the frequentist and Bayesian SSD procedures, when the parameter of interest is a single binomial proportion. Finally, Section 6 contains a brief final discussion.

Frequentist power functions and SSD methods
Let us consider a parameter of interest θ and assume that we are interested in testing H 0 : θ ∈ Θ 0 versus H 1 : θ ∈ Θ 1 , where Θ 0 and Θ 1 form a partition of the parameter space Θ. Moreover, let Y n be the random result of the experiment that is typically a suitable statistic used to summarize the data relevant to the parameter θ. In the notation, we have highlighted that Y n depends on the sample size n. Finally, we denote by f n (y n |θ) the sampling distribution of Y n .
The power function is defined as the probability of obtaining a statistically significant result that leads to reject the null hypothesis H 0 , when the actual value of the parameter is θ. In a frequentist approach, the investigator is firstly required to specify a fixed level α for the type I error probability that one is willing to tolerate. This significance level is typically set equal to 0.05 and is used to obtain the rejection region of H 0 , denoted by R H0 , that represents an appropriate subset of outcomes that-if observed-lead to the rejection of H 0 . Therefore, given a frequentist test of size α, Y n is considered a statistically significant result if it belongs to R H0 . Consequently, in general terms, the power function is defined as where P θ is the probability measure associated with a suitable distribution of Y n .
In order to exploit the frequentist power function in Eq. (1) for sample size determination purposes, investigators can adopt two different approaches: the conditional and the predictive one. The conditional approach is certainly the most widely known and used, when performing sample size calculations based on pre-study power analysis. It requires the specification of a suitable design value for θ, denoted by θ D , that belongs to the alternative hypothesis and is considered a relevant value important to detect. By assuming that the true value of the parameter is equal to θ D , we obtain the frequentist conditional power given by where P f n Á θ D j Þ ð is the probability measure associated with the sampling distribution of Y n when θ = θ D . Since θ D has to be selected within the subspace Θ 1 , the conditional frequentist power can be interpreted as the probability of correctly rejecting H 0 , when the true value of the parameter belongs to the alternative hypothesis and is exactly equal to θ D . Then, the sample size determination criterion consists in choosing the minimal sample size that guarantees a desired level for η C F n; θ D À Á . In practice, the idea is to ensure a sufficiently large probability of rejecting H 0 , when the true θ belongs to the alternative hypothesis and, more specifically, it is equal to θ D ∈ Θ 1 .
The SSD procedure based on the power function in Eq. (2) is strongly affected by the choice of θ D . In order to account for uncertainty in the specification of the design value and to avoid local optimality, it is natural to incorporate Bayesian concepts into the sample size determination process. By adopting a 'hybrid classical-Bayesian approach', it is possible to model uncertainty on the appropriate design value for θ through the elicitation of a prior distribution, denoted by π D (θ) and called design prior. This prior is used to compute the marginal or prior predictive distribution of the data by averaging the sampling distribution as follows: Therefore, the design prior cannot be a non-informative improper distribution in order to have m D n y n À Á well defined. In any case, the elicitation of a non-informative π D (θ) would not be reasonable choice. In fact, the design prior is used to introduce uncertainty on the suitable design value for θ that we need to specify when using the SSD procedure previously described and the possible guessed values have to belong to the subspace Θ 1 . Thus, π D (θ) serves to describe a design scenario of interest that supports values of θ under the alternative hypothesis: it has to be an informative distribution that assigns a negligible probability to values of θ under the null hypothesis.
Once the design prior has been elicited, the idea is to average the conditional frequentist power with respect to it by computing This leads to the frequentist predictive power that is given by where P m D n Á ð Þ is the probability measure associated with the marginal distribution of Y n obtained using π D (θ). The power function in Eq. (5) expresses the probability of making a correct decision by rejecting H 0 , when θ actually belongs to the subspace defined under the alternative hypothesis, where we can assume that it is distributed according to the design prior. Therefore, the corresponding SSD criterion requires to select the minimum n to achieve a desired level for η P F n; π D À Á .
Note that if π D (θ) is chosen as a point mass distribution centred on θ D , no uncertainty on the relevant design values is taken into account and the marginal distribution coincides with the sampling one. In this case, there is no difference between the frequentist power functions obtained under the conditional and the predictive approach.

Bayesian power functions and SSD methods
In the previous section, we have described how to select the sample size through power functions by assuming that a frequentist analysis will be performed at the end of the study. In both the frequentist conditional and predictive powers, the decision about the two hypotheses is based on the construction of the rejection region of H 0 of a classical test of fixed size α. A major limitation to the fully classical and the hybrid classical-Bayesian approaches previously introduced is the inability to incorporate past experience and information about the unknown parameter, as well as expert prior opinions. The use of a 'fully Bayesian approach' allows to take into account important knowledge and belief about θ when planning the study.
It is well known that the information available before starting the study can be expressed by introducing a prior distribution for θ, π A (θ), which in this context is typically called analysis prior to distinguish it from the design prior. It is worth pointing out that π A (θ) is the usual prior distribution employed in a Bayesian analysis: it formalizes pre-experimental knowledge, often represented by historical data, and subjective opinions of experts and is used to compute the posterior distribution of the parameter, π A n θ y n Á ∝ f n y n θ j Þπ A θ ð Þ À À . Moreover, it is often chosen as a non-informative distribution to avoid the inclusion of external evidence in the posterior inference.
Let us recall that, in general terms, a power function is defined as the probability of obtaining a significant result, i.e. a result that leads to the rejection of the null hypothesis. Then, to exploit this function as a useful tool to determine the optimal sample size, we need to compute it under the assumption that the alternative hypothesis is true. In practice, we have to consider a design scenario where the true θ belongs to Θ 1 , so that the power function represents the probability of making a correct decision. Therefore, to define power functions from a Bayesian point of view, first of all we need to decide when we reject the null hypothesis in a Bayesian setting, that is we have to establish the condition for the 'Bayesian significance'. Following Spiegelhalter et al. [10], we define the result Y n as 'significant from a Bayesian perspective' if the corresponding posterior probability that θ belongs to the alternative hypothesis is sufficiently large, that is if where P π A n Á Yn j Þ ð denotes the probability measure associated with the posterior distribution of θ computed using the analysis prior and λ ∈ (0, 1) represents a suitably specified threshold. Let us stress that, since we are dealing with a pre-experimental problem, the posterior probability in Eq. (6) is a random variable, depending on a random result that has not yet been observed. In order to construct Bayesian power functions, we need to compute the probability of obtaining a Bayesian significant result. Similar to what we have seen in the frequentist case, we can use two alternative distributions of the data, according to the approach we decide to adopt.
The conditional approach realizes the pre-experimental assumption that the alternative hypothesis is true, by fixing a design value θ D ∈ Θ 1 , which is considered relevant and important to detect. Then the sampling distribution of Y n conditional on θ D , f n (Á|θ D ), is used to compute the probability of getting Bayesian significance. In this way, we obtain the Bayesian conditional power The predictive approach, instead, aims at avoiding the problem of local optimality in the SSD procedure by introducing a design prior for θ, π D (θ), that accounts for additional uncertainty involved in the choice of the design values θ D . Then, the prior predictive distribution of Y n , m D n Á ð Þ, is computed and used in place of the sampling distribution conditional on θ D . This leads to the Bayesian predictive power Both the power functions in Eqs. (7) and (8) express the probability of rejecting H 0 under a Bayesian framework, assuming that the true θ actually belongs to H 1 . In fact, we assume that θ is equal to a specific value under the alternative hypothesis (conditional approach) or that θ is in the specific subspace defined under the alternative hypothesis, where we can assume that it is distributed according to the design prior (predictive approach). The sample size determination criteria, therefore, require to select the minimal sample size to ensure a sufficiently large level for η C B n; θ D À Á or η P B n; π D À Á . Moreover, note that, when the specified design prior distribution assigns the whole mass probability to θ D , the two Bayesian power functions coincide, leading to the same optimal sample size.

SSD criteria according to the nature of the distribution of Y n
In this section, we explicitly formalize the SSD criteria based on frequentist and Bayesian power functions, according to the nature of the random result Y n . When Y n has a continuous distribution, each of the power functions previously introduced shows a monotonically increasing behaviour as a function of n. In this case, the SSD criteria sensibly select the minimum sample size to guarantee the desired level of power, that is for a conveniently chosen threshold γ ∈ (0, 1]. Let us remark that in the notation for the optimal sample sizes, as well as in the notations for the power functions, the subscripts are used to specify the approach (frequentist or Bayesian) adopted at the analysis stage. The superscripts, instead, indicate the appoach (conditional or predictive) used to represent the design expectations. An application of the criteria formalized above is provided by Gubbiotti and De Santis [11], where it is assumed that the statistic Y n follows a normal distribution with mean equal to θ and known variance.
However, it may happen that η C F n; θ D À Á , η P F n; π D À Á , η C B n; θ D À Á and η P B n; π D À Á are not monotonically increasing functions of the sample size: this occurs when dealing with discrete distributions of Y n . In these cases, the power functions show a basically increasing behaviour as a function of n, but with some small fluctuations. A suitable SSD criterion has to take into account this kind of behaviour. For instance, instead of selecting the smallest sample size that attains the condition of interest, it can be considered more appropriate to select the smallest sample size in such a way that the condition is fulfilled also for all the sample size values greater than it. Given a threshold γ ∈ (0, 1), the corresponding SSD criteria are In this way, it is possible to avoid the paradox of having the condition of interest fulfilled for the selected sample size, but not satisfied for some larger values of n any longer.

Single binomial proportion using exact methods
In this section, we focus on exact procedures for one-sample testing problem with binary response. For instance, in a clinical context, we could be interested in evaluating the efficacy of a new experimental treatment or drug that is received at the same dose by all the n patients enrolled in the trial. No comparisons with other therapies are involved. A binary response variable, which assumes value 1 if clinicians classify the patient as a responder to the therapy and 0 otherwise, is considered and, therefore, the parameter of interest θ is the true response rate (i.e. an unknown proportion). In these one-arm studies, θ is compared with a fixed target value, say θ 0 , that should ideally represent the response rate for the current 'gold standard' therapy and that is typically obtained through historical data. Values of θ greater than θ 0 suggest that the experimental drug can be considered sufficiently effective and, therefore, the following hypotheses are considered This kind of single-arm studies is typically conducted in phase II of clinical trials, whose primary goal is not to definitively assess the efficacy of new drugs, but to screen out those that are ineffective. In practice, in the clinical development process of a new drug, phase II aims at avoiding that not sufficiently promising treatments reach phase III, where randomized controlled trials, based on large patients groups, are generally conducted.
It is important to point out that the power functions based on exact procedures usually do not have explicit forms. Hence, exact formulas for sample size calculations cannot be obtained. However, it is possible to proceed numerically by evaluating the conditions of interest for different increasing or decreasing values of the sample size, until reaching the optimal one. In the following sections, we provide the expressions of the frequentist and Bayesian power functions for non-comparative studies with binary responses. The saw-toothed shape of the power curves as a function of n is shown and, hence, the conservative criteria illustrated in the previous section are adopted. All the graphical and numerical results have been obtained by using the R programming language [12].

Frequentist conditional power
In the statistical context described above, the number of responders out of the n patients treated with the new drug (i.e. the number of successes in n trials) is the natural statistic Y n we have to consider and its sampling distribution is f n y n θ j Þ ¼ bin y n ; n; θ À Á , for y n ¼ 0, ::.; n; À where bin(Á; n, θ) denotes the probability mass function of a binomial distribution of parameters n and θ.
Let us consider the two hypotheses in Eq. (17). For a fixed significance level α and assuming that H 0 is true, there exists a non-negative integer r between 0 and n such that X n i¼r bin i; n; θ 0 ð Þ≤ α and X n i¼rÀ1 bin i; n; θ 0 ð Þ> α: Then, the rejection region at α level is R H0 ¼ y n ∈ 0, 1, ::.; n f g: y n ≥ r È É , where the critical value r can be expressed in symbols by r ¼ min k ∈ 0, 1, ::.; n f g: For a given design value θ D , that has to be specified under the alternative hypothesis, the frequentist conditional power is provided by In practice, η C F n; θ D À Á is obtained by the sum of the probabilities of the all the outcomes that belong to R H0 , when we assume that the true θ is equal to the design value.  The reasons for this saw-toothed behaviour can be clarified by the numerical results presented in Table 1. Here, for all the possible values of the sample size between 3 and 50, we provide not only the level of the frequentist conditional power used to obtain Figure 1, but also the corresponding critical value r and the actual value for the type I error probability. Obviously, this latter value is always below the fixed threshold 0.05. Note that whenever the sample size is increased by one unit, the corresponding critical value r may also increase or it may remain constant. In the second case, both the actual type I error rate and the conditional frequentist power grow up; otherwise, if also the critical value changes by one unit, they both get smaller.
To help in reading the  of sample sizes with the same critical value: within each block both the power and the actual type I rate monotonically raise as n increases. But, in correspondence with the first sample size of the subsequent block, they both decrease. This determines the basically increasing behaviour of the power as a function of n, with some small fluctuations, which is represented in Figure 1. For additional discussion about the saw-toothed shape of the frequentist power function, the reader is referred to Chernick and Liu [13].
Now, the problem of which sample size we should select arises because of the non-monotonic behaviour of η C F n; θ D À Á . If we set the desired threshold γ for the power equal to 0.8, we have that the smallest sample size that meets the power requirement is n = 35. At that sample size, the critical value is 12 and the power level is 0.8048. Then for n = 36, the critical value is still 12 and the power increases to 0.8380. However, the power drops below 0.8 to 0.7783, when n = 37, at which r = 13, and rises again over 0.8 when n = 38. Then η C F n; θ D À Á never decreases below 0.8 for sample sizes greater than 38. Therefore, instead of selecting the smallest n that attains the power condition, it can be more appropriate to consider the more conservative sample size criterion formalized in Section 4, according to which the optimal sample size is selected as The criterion ensures that the power will not decrease below the desired threshold for any larger sample size: in our specific case, it consists in selecting n = 38, instead of n = 35.

Frequentist predictive power
In order to model uncertainty in the specification of the design value, we need to adopt the hybrid classical-Bayesian approach described previously. We introduce a beta design prior density for θ, π D (θ) = beta(θ; α D , β D ), that is used to obtain the prior predictive distribution of the data. It is well known that by averaging the binomial sampling f n (y n |θ) with respect to the beta design prior, we obtain the following marginal distribution m D n y n À Á ¼ beta-bin y n ; α D ; β D ; n À Á , for y n ¼ 0, ::.; n; where beta-bin(Á; α D , β D , n) denotes the probability mass function of a beta-binomial distribution with parameters (α D , β D , n).
The design prior π D (θ) can be elicited in many different ways. One useful possibility consists in (i) setting the prior mode equal to the fixed design value θ D , which investigators would choose within the subset under H 1 when using the conditional approach, and (ii) regulating the concentration of the distribution around its mode according to the degree of uncertainty one wishes to express. This can be done by using for the hyperparameters of π D (θ) the following expressions: where θ D is the prior mode and n D is a design parameter that can be interpreted as prior sample size. The larger the n D , the smaller the variance of the beta design prior. Therefore, we need to increase n D if we want to reduce uncertainty on the guessed values of θ. More specifically, if we set n D = ∞, the design prior of θ assigns all the probability mass to θ D : in this case, no uncertainty is involved and the marginal distribution of the data coincides with the sampling distribution conditional on θ D . We thus must set n D < ∞ to distinguish between conditional and predictive approaches. In particular, once a prior mode θ D has been selected, the researcher can choose n D by assuring a large level (say very close to 1) for P π D Á ð Þ θ > θ 0 ð Þ , that is the probability assigned by π D (θ) to the event θ > θ 0 . Let us assume, for instance, that θ 0 = 0.2 and consider three possible choices for θ D (i.e. 0.3, 0.4 and 0.5). For each of them, we compute the smallest n D such that P π D Á ð Þ θ > θ 0 ð Þ is about equal to 0.999, and the behaviour of the corresponding design priors is shown in Figure 2(a). Clearly, if the prior mode approaches θ 0 , we need to increase n D to guarantee that P π D Á ð Þ θ > θ 0 ð Þ≃ 0:999. Moreover, for a fixed prior mode θ D , if we decided to decrease the value of n D with respect to the one used in the graph,  Once π D (θ) has been specified, the frequentist predictive power can be obtained by computing the probability of rejecting the null hypothesis at α level with respect to m D n y n À Á . Hence, we have where r is the critical value provided in Eq. (20). In practice η P F n; π D À Á is given by the sum of the probabilities of the all the outcomes inside R H0 , computed under a design scenario according to which the true θ belongs to the interval (θ 0 , 1), where it is distributed according to the design prior density. Let us remark again that if the design prior is a point mass distribution on θ D (i.e. n D = ∞), we have that the frequentist power functions, conditional and predictive coincide.
Similarly to the frequentist conditional power, also the predictive one presents a saw-toothed shape as a function of n, since m D n y n À Á is a discrete distribution. Therefore, we suggest to adopt the conservative approach previously described and to select for a fixed desired threshold γ. Figure 3 shows the behaviour of the frequentist predictive power as a function of n for different choices of the design prior, when θ 0 = 0.2 and α = 0.05. More specifically, we consider the three π D (θ) plotted in Figure 2(b) that are all centred on θ D = 0.4, but with different degrees of concentrations regulated by the n D value. In each graph, we highlight which is the optimal sample size obtained according to the criterion in Eq. (26) when γ = 0.8. Note that the larger the n D , the smaller the degree of uncertainty we introduce through the design prior and, as a consequence, the smaller the optimal sample size. In fact, we obtain the optimal values 46, 42 and 39, for n D equal to 60, 111 and 255, respectively. If we set n D = ∞, we would retrieve the conditional criterion in Eq. (22), where no uncertainty is considered in specifying the design value, and the optimal n would be equal to 38 (see Figure 3. Behaviour of η P F n; π D À Á as a function of n for different choices of the design prior distribution, when θ 0 = 0.2 and α = 0.05. Figure 1). Moreover, let us fix again θ 0 = 0.2, α = 0.05 and γ = 0.8 and consider the three design prior distributions in Figure 2(a), which are characterized by different prior modes. The evident difference between the prior scenarios represented by these design priors clearly affects the optimal sample size: we obtain the optimal values 157, 46 and 23, for (θ D , n D ) = (0.3, 163), (θ D , n D ) = (0.4, 43) and (θ D , n D ) = (0.5, 20), respectively.

Bayesian conditional power
When we decide to adopt a Bayesian approach to establish the statistical significance of the result, we need to introduce an analysis prior distribution for θ. In our specific case, it is computationally convenient to specify a beta analysis prior, π A (θ) = beta(θ; α A , β A ): in this way, from conjugate analysis we obtain that the corresponding posterior distribution is still a beta density with updated parameters, Through π A (θ), the researcher can incorporate in the SSD procedure pre-experimental knowledge, as well as sceptical or enthusiastic expert prior opinions about the efficacy of the experimental treatment. However, one of the most common ways of proceeding is to choose a non-informative-or based on very weak information-density, to let the posterior distribution be based almost entirely on the evidence in the data. We could, therefore, specify π A (θ) = beta (θ; 1, 1) or consider the non-informative Jeffreys prior. Alternatively, if we want to use informative analysis prior distributions, we can express the hyperparameters in terms of the prior mode θ A and the prior sample size n A , that is In this way, for instance, it is possible to express scepticism or optimism about large treatment effects by setting θ A less or higher than the target θ 0 , respectively. Obviously, when θ A < θ 0 , the larger the n A , the larger the degree of scepticism we wish to express; while, when θ A > θ 0 larger values of n A are used to increase the degree of enthusiasm we desire to take into account. However, the value n A = 1 is often used to have a weakly informative prior distribution. The upper panel of Figure 4 shows three possible choices for the analysis prior when θ 0 = 0.2. These distributions are obtained by fixing the prior mode θ A and, then, selecting n A so that P π A Á ð Þ θ > θ 0 ð Þ(i.e. the probability assigned by π A (θ) to the event θ > θ 0 ) is about equal to a desired level. More specifically, we have considered (i) a sceptical prior mode θ A = 0.1 and P π A Á ð Þ θ > θ 0 ð Þ≃ 0:4, (ii) a neutral prior mode θ A = 0.2 and P π A Á ð Þ θ > θ 0 ð Þ≃ 0:6 and finally (iii) an enthusiastic prior mode θ A = 0.3 and P π A Á ð Þ θ > θ 0 ð Þ≃ 0:8. The corresponding values of n A are 7, 14 and 4, respectively. These densities will be used to illustrate how the optimal sample sizes based on Bayesian powers are affected by the information formalized through the analysis priors.
The random result Y n is defined as 'significant' from a Bayesian perspective, if the corresponding posterior probability that θ > θ 0 is sufficiently large. In symbols, we decide to reject the null hypothesis, on the basis of the result Y n , if the following condition is satisfied.
where P π A Á Yn j Þ ð is the probability measure associated with the posterior distribution in Eq. (27) and λ ∈ (0, 1) is a pre-specified threshold. It is worth noting that, for a given value of n, the posterior quantity P π A n ÁjYn ð Þ θ > θ 0 ð Þis an increasing function of Y n . As a consequence, we can find a non-negative integerr between 0 and n, such that and we can claim that H 0 is rejected if the observed number of responders y n is equal to or greater thanr. In practice,r represents the smallest number of successes such that the condition for the Bayesian significance is satisfied, and in symbols it can be expressed by e r ¼ min k ∈ f0, 1, ::.; ng : P π A n ðÁjkÞ ðθ > θ 0 Þ > λ n o : By considering a fixed design value θ D greater than θ 0 , the Bayesian conditional power is therefore obtained as η C B n; θ D À Á ¼ P f n ðÁjθ D Þ P π A n ðÁjYnÞ ðθ > θ 0 Þ > λ ¼ X n y n ¼r binðy n ; n, θ D Þ: Essentially, it is given by the sum of the probabilities of all the Bayesian significant results, computed assuming that the true θ is equal to θ D .
Since we are dealing with discrete data, also this power function is not monotonically increasing as a function of n. Let us assume that θ 0 = 0.20, θ D = 0.4 and λ = 0.9. The detailed calculations shown in Table 2 can help to understand why η C B n; θ D À Á has the typical saw-toothed behaviour. For each sample size between 3 and 50, the table provides the corresponding value ofr, the level of the Bayesian conditional power and the posterior probability that θ exceeds θ 0 conditional on the resultr. Clearly, these latter values are always larger than the threshold λ that is 0.9. The white and grey colours are used alternately to highlight blocks of sample sizes with the same value ofr associated. When the sample size grows, butr remains constant, P π A n Ájr ð Þ θ > θ 0 ð Þ decreases, while η C B n; θ D À Á increases. However, when both n andr are simultaneously increased by one unit, P π A n Ájr ð Þ θ > θ 0 ð Þjumps up, while the Bayesian power drops.
Because of the saw-toothed nature of the power curve, for a fixed threshold γ, the optimal sample size is selected using the conservative criterion, that is The lower panel of Figure 4 shows the behaviour of the Bayesian conditional power as a function of n for each of the three analysis prior density plotted in the upper panel, when θ 0 = 0.2, θ D = 0.4 and λ = 0.9. In each graph, it is indicated the optimal sample size according to the criterion in Eq. (33) for γ = 0.8. As expected, as we move from sceptical prior opinions towards more enthusiastic beliefs about the efficacy of the experimental treatment, the required sample size decreases.

Bayesian predictive power
Besides introducing pre-experimental information, if we also wish to model uncertainty on the design value, we have to consider the Bayesian predictive power. Therefore, as described in Section 5.3, we elicit an analysis prior distribution to obtain the beta posterior density π A n θ y n Á À . Moreover, following the indications provided in Section 5.2, we introduce a design prior distribution to construct the marginal distribution m D n y n À Á .
The Bayesian predictive power is computed by adding the probabilities of all the Bayesian significant results, computed under the design scenario expressed through the design prior. Thus, we have beta-bin y n ; α D ; β D ; n À Á ; wherer is given in Eq. (31). Obviously, also η P B n; π D À Á shows the typical saw-toothed behaviour as a function of n, because of the discrete nature of the beta-binomial marginal distribution of y n . Therefore, given a desired threshold γ and according to the suitable conservative approach previously used, we select the optimal sample size as  Table 2. Numerical calculations to explain the saw-toothed behaviour of η C B n; θ D À Á as a function of n: sample sizes, the corresponding value ofr, the Bayesian conditional power and the posterior probability that θ > θ 0 when the observed result is equal tor successes, for θ 0 = 0.20, θ D = 0.4 and λ = 0.9.
In Table 3 we provide the values of n P B , for different choices of the analysis and the design prior densities. More specifically, we consider the three analysis priors plotted in the upper panel of Figure 4 and the design prior distributions represented in both the panels of Figure 2, when θ 0 = 0.2 and λ = 0.9. Similarly to what we have seen for the Bayesian conditional power, the sample sizes obtained under the sceptical analysis prior are uniformly larger than those obtained under the more enthusiastic distributions. As regard the impact of the design priors, it is straightforward to see that the stronger the degree of uncertainty on the appropriate design value expressed by π D (θ), the larger the required sample size. For instance, for a fixed prior mode of the design prior, n P B increases as n D get smaller (see Table 3(b), where θ D = 0.4). However, let us note that more evident changes in the sample size can be appreciated when we compare the effects of design priors based on different prior modes (see the results in Table 3(a), where the design priors represent very distant design scenarios).
These Bayesian predictive SSD procedures, which include the conditional ones as a special case, have been exploited in Ref. [8] to construct single-arm two-stage design for phase II of clinical trials based on binary data. In Ref. [14], instead, an extension to the randomized case has been presented, while in Ref. [15] the same procedures have been implemented by adding the possibility of taking into account uncertainty in the historical response rate.

Conclusions
Especially in clinical research, the pre-experimental power analysis is one of the most commonly used methods for sample size calculations. It is tacitly implied that the power function is constructed under a frequentist framework. However, it is possible to introduce Bayesian concepts in the power analysis to provide more flexibility to the sample size determination process.
When the power function is used as a tool to obtain the appropriate sample size, the general idea is to ensure a large probability of correctly rejecting the null hypothesis H 0 , when it is actually false because the true θ belongs to H 1 . Therefore, the conjecture that the alternative  Table 3. n P B for different choices of the analysis and the design priors, when θ 0 = 0.2 and λ = 0.9.
hypothesis is true represents an essential element of the method. It can be realized by assuming that the true θ is equal to a fixed design value θ D , suitably selected inside H 1 (conditional approach); alternatively, we can introduce uncertainty on the guessed design value by introducing a design prior distribution that assigns negligible probability to values of θ under H 0 (predictive approach). Moreover, the decision about the rejection of H 0 can be made under a frequentist framework or by performing a Bayesian analysis. In the latter case, it is possible to incorporate in the methodology pre-experimental information possibly available through the specification of an analysis prior distribution. By combining frequentist and Bayesian procedures of analysis, with both the conditional and predictive approaches, we obtain the four power functions described in this chapter. Let us remark that the Bayesian predictive power is the one that allows to add more flexibility to the sample size calculations. At the same time, it let the researcher take into account prior knowledge, as well uncertainty on the design value. However, no design uncertainty can be involved by considering a point-mass design distribution. On the other hand, if no information is available, it is possible to elicit a non-informative analysis prior and let the analysis be based entirely on the data.