Parametrized ringdown spin expansion coefficients: a data-analysis framework for black-hole spectroscopy with multiple events

Black-hole spectroscopy is arguably the most promising tool to test gravity in extreme regimes and to probe the ultimate nature of black holes with unparalleled precision. These tests are currently limited by the lack of a ringdown parametrization that is both robust and accurate. We develop an observable-based parametrization of the ringdown of spinning black holes beyond general relativity, which we dub ParSpec (Parametrized Ringdown Spin Expansion Coefficients). This approach is perturbative in the spin, but it can be made arbitrarily precise (at least in principle) through a high-order expansion. It requires O(10) ringdown detections, which should be routinely available with the planned space mission LISA and with third-generation ground-based detectors. We provide a preliminary analysis of the projected bounds on parametrized ringdown parameters with LISA and with the Einstein Telescope, and discuss extensions of our model that can be straightforwardly included in the future.

The post-merger ringdown signal from a remnant BH can be modeled as a superposition of damped sinusoids [7][8][9], each defined by an oscillation frequency ω and a damping time τ . Owing to the BH uniqueness and no-hair theorems [16][17][18][19], the entire QNM spectrum of a spinning (Kerr) BH in general relativity (GR) is completely determined by the mass M and spin J = χM 2 of the BH. (We use G = c = 1 throughout.) Thus, measuring one frequency and damping time allows us to infer the mass and spin of a merger remnant from the ringdown signal only, whereas measuring more than two quantities (i.e., also subdominant modes) provides multiple independent null-hypothesis tests of GR [3,4,[20][21][22][23][24][25][26][27]. These tests require high signal-to-noise ratio (SNR) in the ringdown [4,20] and will become routinely available with the space mission LISA [28] and with third-generation (3G) ground-based GW detectors (such as the proposed Einstein Telescope (ET) [29] and Cosmic Explorer [30]), which are expected to detect several ringdown events per year with SNR in the hundreds to thousands, even from sources at cosmological distance [31].
The LIGO/Virgo Collaboration checked that the full inspiral-merger-ringdown waveform is consistent with GR by analyzing separately the lower-frequency signal emitted during the inspiral phase and the higherfrequency signal emitted during the late inspiral, merger and ringdown of the first event, GW150914 [32]. Separately fitting each of these signals to GR-based templates leads to two independent estimates of the mass and dimensionless spin of the remnant BH. An extension of this analysis to seven selected binary BH events found that the two estimates are compatible with each other within statistical errors of order 30% (see Fig. 2 in [33]). Recent work tried to better quantify the contribution of additional overtones to the high-frequency signal for GW150914. Adding at least one overtone is necessary to obtain ringdown estimates of the mass and spin of the remnant which are in agreement (at the ∼ 20% level) with the values inferred by fitting the entire signal within GR [34,35] (see also [20,24,36]).
Going beyond these consistency tests requires modeling the BH ringdown beyond GR, for instance to perform a Bayesian model selection between GR and any proposed extension of the theory. This is a challenging task and, despite recent progress [37][38][39][40][41][42][43][44], all current attempts have significant limitations: they are based on particular classes of theories, they use geometric-optics approximations for the QNMs, or they neglect the spin of the remnant.
Working in the nonrotating limit is a major limitation, since the final spin of the merger remnant is typically high [45][46][47][48]. Including spin in current approaches is challenging, especially because the geometry of spinning BHs beyond GR is known only perturbatively or numerically (see e.g. [49][50][51][52][53][54][55] for specific examples and [11,56,57] for reviews), which makes it very hard to compute the QNMs. In addition, there is in general no analog of the Teukolsky equation [6,58,59] beyond GR. In general the perturbation equations are not separable [60], and this requires the solution of an elliptic system of partial differential equations [61] or the extraction of QNM frequencies from numerical-relativity simulations of BH mergers [62][63][64][65].
In contrast to these major technical limitations, modeling the BH QNMs beyond GR is remarkably straight-forward. In any extension of GR, the QNMs of a BH can be parametrized as [21,22,27] where the frequency ω Kerr and damping time τ Kerr depend only on M and χ, whereas δω and δτ are generic deviations. We consider a modified ringdown which deviates perturbatively from the Kerr case in GR, i.e. δω ω Kerr and δτ τ Kerr . These departures can be due to extra charges, a modified theory of gravity, environmental effects, etcetera, and we wish to develop a generic framework that can accommodate various special cases. GR corrections might affect the ringdown in two ways: by predicting a spinning BH other than Kerr [11,12,15,50,51,53,57,66], or (even if GR BHs are still solutions of the theory) by affecting the dynamics of the perturbations [40,[67][68][69][70][71]. In both cases, the ringdown modes will acquire corrections proportional to the fundamental coupling constant(s) of the theory. There may be new classes of modes associated to extra polarizations, but they are unlikely to be significantly excited for GR deviations small enough to be compatible with existing observations [37,68,72]. For this reason they will not be considered in our analysis.
The above discussion suggests that a case-by-case analysis is impractical, and that parametrizing directly the observables (i.e., frequencies and damping times) is the most efficient way to perform ringdown tests (see e.g. [22,27,42,43] for work in this direction). Similar observable-based approaches have been very successful to model weak-field effects [73] and the inspiral [74,75].
In this paper we develop a scheme based on "Parametrized Ringdown Spin Expansion Coefficients" (ParSpec) which differs from related hierarchical approaches [76] and mode-stacking proposals [77]. Its salients features are as follows: 1) We expand the spectrum in a bivariate series in terms of the fundamental parameters (mass and spin) characterizing BH dynamics in GR.
2) The expansion parameters take into account the fact that modifications of GR are suppressed by a (possibly dimensionful) coupling constant.
3) Bayesian inference allows us to identify the most easily measurable expansion coefficients. By combining several observations, we can in principle map the deviation parameters to specific modified theories of gravity for which QNM spectra may be available. Since the third LIGO/Virgo observing run has been detecting BH mergers on a weekly basis, this approach holds the promise of allowing us to constrain several parameters (or identify deviations) as soon as the typical SNR of the observations becomes large enough.
The plan of the paper is as follows. In Section II we describe our parametrized framework. In Section III we illustrate the potential of the method by performing a statistical analysis on a representative catalog of merger events with LISA and 3G Earth-based detectors. In Sections IV and V we compare our framework with previous work and discuss directions for future research.

II. PARSPEC FRAMEWORK
Let us assume i = 1, . . . , N independent ringdown detections, for which q QNMs are measured. In general q depends on the source i, but for simplicity we shall consider a subset of all N T merger events for which the same number q of QNMs passes a certain SNR threshold. Therefore N is (in general) smaller than N T , but this is not a major limitation, given the high event rates expected for future detectors.
In terms of a standard spheroidal-harmonics decomposition [9], and depending on the intrinsic parameters of the progenitor binary (mass ratio and spins), typically the most excited QNMs 1 are the fundamental modes with l = m = 2, l = m = 3, and l = 2, m = 1. For simplicity we will assume the subdominant mode to be l = m = 3 for all N T sources; this assumption will be justified below. For a given (l, m), the overtones are in general relevant for parameter estimation [24,34,35,[78][79][80]. However, the frequencies of different overtones are very similar and hard to resolve [4,36], and therefore it is hard to use them for direct BH spectroscopy. For this reason, in this paper we will not consider overtones.
Rather than considering the corrections in Eqs.
(1) and (2) as independent parameters, it is sensible and convenient to reduce the dimensionality of the parameter space by performing a spin expansion. To this aim, we parametrize each mode of the i-th source as   are "beyond-Kerr" corrections to the QNM frequencies. Crucially, the latter are universal dimensionless numbers that do not depend on the source. Any possible source dependence is parametrized through γ i , as discussed below.
As customary in parametrized approaches, we focus on perturbative corrections by assuming γ i δw (n) 1, γ i δt (n) 1, and GR is recovered in the limit γ i → 0. We remark that in the parametrization (3), (4), M i and χ i are the BH masses (in the detector frame) and spins extracted assuming GR. In a non-GR theory, these are generally different from the actual BH masses and spins,M i ,χ i (see Appendix A). Therefore, the coefficients γ i , δw (n) J and δt (n) J also include the shift between M i , χ i and the physical masses and spins. Since M i and χ i refer to the GR values of the detector-frame mass and spin of the i-th source, they can be computed either from the full inspiral-merger-ringdown waveform within GR or from a measurement of the l = m = 2 mode with a standard GR ringdown template (without any assumption on the luminosity distance of the source). The remaining parameters in Eqs. (3) and (4) are discussed below for various special cases.

A. Special cases
Case I: scale-free corrections.
The simplest parametrized beyond-Kerr correction corresponds to having γ i = α for all sources, where α is a dimensionless coupling constant. Then α can be re-absorbed within δw (n) and δt (n) , and (assuming that M i and χ i are known within some parameter estimation uncertainty) we can parametrize the QNM spectrum beyond GR with P = 2(D + 1)q (5) parameters, where 2(D + 1)q is the total number of δw (n) and δt (n) parameters required if we consider q modes up to order D in the spin expansion.
Case II: single dimensionful coupling. A more general model consists of a single fundamental, dimensionful coupling constant α (the extension to multiple coupling constants is straightforward). Without loss of generality, we assume that α has mass dimensions [α] =M p , where p is fixed by the theory (for p = 0 we recover Case I above). HereM is the typical mass/length scale in the problem, which for a BH coincides with its mass in the source frame M s , as measured within GR (see Appendix A for a discussion). In this case, since the coefficients γ i are linear in the coupling, to leading order in our perturbative scheme are small dimensionless couplings that depend on the theory, on the source mass in the detector frame M i , and on the source redshift z i . The redshift can be estimated from the luminosity distance of the source, which can be extracted from the amplitude of the inspiral waveform (assuming the standard cosmological model 2 ). We will consider p as fixed (in modified theories of gravity it is typically an integer, or possibly a rational number) so that the number of parameters is the same as in Case I [cf. Eq. (5)]. Note that α can be again reabsorbed within δw (n) and δt (n) , but α is dimensionful if p = 0 (and so are δw (n) and δt (n) after the rescaling). Cases I (i.e., p = 0) and II include some of the best studied modified theories of gravity: • p = 0: theories with dimensionless couplings in the action include certain scalar-tensor theories, Einstein-Aether and Hořava gravity (to leading order) [83].
As discussed in [43], if different classes of gravitational perturbations are non-degenerate at order zero in the coupling parameter, the leading-order corrections to the QNM frequencies coming from O(α) terms in the action are O(α 2 ). In this case, an operator with mass dimension p/2 in the action will lead to a correction α 2 /M p in the QNMs. In the case of degenerate spectra (e.g., for axial and polar gravitational perturbations) the leading-order corrections to the QNMs coming from O(α) terms in the action are also O(α). The special cases discussed above correspond to terms in the action with mass dimension 0, 2 and 3, respectively. Case III: individual charges. Since the γ i 's appearing in Eq. (6) depend only on the fundamental coupling and the masses in the source frame, Cases I and II encompass BHs with secondary hair, but not BHs with primary hair (corresponding to an extra charge which does not depend on the mass and spin of the BH). A simple example of BHs with primary hair are Kerr-Newman BHs, which are useful and well-studied toy models for beyond-Kerr BHs, and may be astrophysically significant in certain darksector scenarios [89]. In the case of BHs with primary hair we have where Q i is the charge of the i-th source. The number of parameters necessary to parametrize the spectrum then becomes Our approach is perturbative by assumption (i.e. γ i 1). In the Kerr-Newman example, this means that it can only accommodate weakly charged BHs.

B. Detection strategies
In summary, we can parametrize the QNM spectrum beyond GR with P parameters in Cases I and II, and P parameters in Case III. On the other hand, in principle for N sources and q modes we have a certain number O of observables, which depend on whether we consider the ringdown only, or rather extract M i and χ i from the full inspiral-merger-ringdown waveform using numerical relativity fits (see e.g. [90]) or analytical models. Using the inspiral-merger-ringdown. In this case we measure the individual binary component properties from the inspiral-merger-ringdown waveform, and we use numerical-relativity fits in GR to evaluate the final masses (in the detector frame) and spins in GR, M i and χ i (i = 1, ..., N ). This procedure allows us to use only the l = m = 2 QNM to perform BH spectroscopy: it is essentially an extension of inspiral-merger-ringdown consistency checks that allows for a non-GR template in the ringdown.
In this case, for N sources and q modes we would have observables, i.e. the frequencies and damping times of q modes. Then we need N > D + 1 Cases I and II, in order to have more observables than parameters. In this case also q = 1 is allowed, i.e. the detection of the l = m = 2 mode for all sources is sufficient to perform the test. For the minimal case q = 1 (detection of one mode for each source) we get N ≥ D + 1 Cases I and II, 2(D + 1) Case III.
Using the ringdown only. For N sources and q modes we have observables, namely the frequencies and damping times of q modes minus 2N observables (the frequencies and damping times of the fundamental modes) which have been used to extract the GR masses and spin. By comparing O with the number of parameters (either P or P ), we need Cases I and II, in order to have more observables than parameters. Note that this condition is never satisfied for q = 1, as expected, since a single mode can only allow us to determine the masses and spins for each source. For the minimal case q = 2 (i.e., we are detecting 2 modes for each source) we get N ≥ 2D + 2 Cases I and II, 4D + 4 Case III.
Equation (13) implies N ≥ D + 1 in all cases in the limit q 1, i.e. when we can detect several modes for each source. Note that in this case we need twice of the sources that we needed in the inspiral-merger-ringdown case [Eq. (11) above]. Minimum number of sources. When the relations (11) and (14) are saturated, N is the minimum number of sources necessary to perform the test, whereas further sources will allow for multiple, independent tests. The minimum number of sources depends on the truncation order of the spin expansion.
In order to estimate the accuracy needed when we truncate the spin expansion, in Fig. 1 we compare numerical calculations of the QNM frequencies of a Kerr ) and their small-spin expansion for various truncation orders (see Table I) as a function of the BH spin (the value of D is indicated in the legend).
BH with their small-spin expansion at various truncation orders. For D ≥ 4 (resp., D ≥ 5) the errors are smaller than 1% for both the frequency (top panel) and damping time (bottom panel) when χ < 0.6 (resp., χ < 0.7). Therefore a truncation order D = 4 or D = 5 should be sufficient to compute the modes with an accuracy always better than 1% up to spin χ = 0.7. Reaching the same accuracy at χ = 0.8 will require D ≥ 7. As a proof of principle of the ParSpec formalism, in the following we shall consider D = 4 as a working assumption. We shall furthermore restrict to the simplest case (Case I), which minimizes the number of parameters and is more in line with existing parametrized tests, e.g. in the inspiral [73][74][75]. Hence, we require N ≥ 5 inspiral-merger-ringdown detections. Note that the minimum value of N grows only linearly with D: even a very large spin expansion order (e.g., D = 10) would require the same order of magnitude in terms of ringdown detections. Such a number of detections (even at large SNRs) may well be achievable with LISA and 3G detectors.

III. STATISTICAL ANALYSIS: CONSTRAINING THE BEYOND-KERR RINGDOWN PARAMETERS
We use Eqs.
(3)-(4), expanded up to fourth order in the spin, as templates to interpret the observed frequencies and damping times. We assume the true (ω i , τ i ) to correspond to a Kerr BH in GR, i.e. we assume the "true" beyond-Kerr parameters δw (n) and δt (n) to be zero.
Our goal is to reconstruct the probability distribution of the beyond-Kerr parameters. We consider either q = 1 or q = 2, i.e. either one or two modes 3 detected for each of the N sources. The purpose of this analysis is to compute the minimum value of the deformation parameters δw (n) and δt (n) which yield a ringdown observation consistent with GR. We consider a ground-based 3G detector (ET in the so called ET-D configuration [91]) and the planned space mission LISA [28] as representatives of our best near-future chances to carry out BH spectroscopy over a large mass range [31,92].
Each source (i = 1, 2, ..., N ) provides frequencies and damping times (ω i, obs injected in our analysis are computed as follows. We consider the merger remnant of N binary coalescences. The 2N masses of the binary components are drawn from a log-flat distribution between [5,95]M for stellar-origin BHs, and from a uniform distribution within [10 6 , 10 7 ]M for massive BHs. For stellar-origin BHs we also require that m 1 + m 2 < 100M [93,94]. In both mass ranges the spins are sampled from a uniform distribution ∈ [−1, 1]. For illustration, we fix the source distance by choosing the SNR of the first ringdown mode to be 10 2 for ET and 10 3 for LISA, respectively. We then compute the mass and the spin of the final BH formed after merger using semianalytical relations based on numerical relativity simulations in GR [90] (as discussed in Sec. II, these are the mass and spin that the final BH would have if GR is the correct theory of gravity). Given the final mass M i and spin χ i (i = 1 . . . N ), we compute the errors on the modes through a Fisher-matrix approach. Following [95], we roughly estimate the energy in the secondary mode to be 10% of the l = m = 2 mode energy for all sources. The main purpose of these choices is to test our data-analysis framework, and we plan to implement more astrophysically realistic models in future work.
We use a Bayesian approach to sample the probability distributions P of the model parameters θ for a given set of ringdown observations d. By Bayes' theorem P ( θ| d) ∝ L( d| θ)P 0 ( θ), where L( d| θ) is the likelihood function and P 0 ( θ) is the prior on the parameters. For each event, the likelihood function is chosen to be Gaussian: where the vector µ i depends on the difference between the observed J = 1, . . . q modes and the parametrized templates (3)-(4): where each µ and Σ i is the covariance matrix that includes errors and correlations between the frequencies and damping times measured for the i-th source. Under our assumptions the observed QNMs correspond to different values of l and m, i.e. they are "quasi-orthonormal" in the terminology of Ref.
[4]. As a consequence the covariance ma- i ) is block-diagonal with each block corresponding to the J-th mode, and the likelihood function can be written as a product of Gaussian distributions: Moreover, given N independent BH detections, the combined likelihood function of the ParSpec parameters can be further factorized as The full posterior is obtained through a Markov chain Monte Carlo (MCMC) method based on the Metropolis-Hastings algorithm [96], in which the proposal matrix is updated through a Gaussian adaptation which enhances the convergence to the target distribution [97,98]. For each dataset we compute 4 chains of 5×10 6 points, with a thinning factor of 0.02 to reduce the correlation between the samples. We discard 10% of the initial points as a burn in.
The beyond-Kerr dimensionless parameters θ = {δw Moreover, for simplicity we assume the parameter p associated with the mass dimension of the coupling to be p = 0. We defer a more detailed investigation for different values of p to future works. We remark that when p > 0 the dimensionless coupling γ i = α/M p is much smaller for massive BH mergers, and therefore we expect QNM frequency corrections for LISA sources to be much smaller than those for stellar-origin BHs. 1 . We also focus on the first strategy discussed in Sec. II B, i.e. we assume that the BH masses and spins are measured using the full inspiral-mergerringdown signal. Then the minimum number of events required to perform our test is N min = D + 1 = 1.
The top and right panels of Fig. 2 show the inferred marginalized distributions of the two ParSpec parameters as a function of the number N of stellar mass sources detected by ET. The posteriors are peaked around zero and, as expected, they become narrower as N grows. In the most optimistic case we consider (N = 100) we find |δw  1 | 3.4 × 10 −3 at 90% confidence level. The distributions are nearly symmetrical around the peak, and therefore we can define their width σ as half of the corresponding confidence interval. We have fitted this width as a function of N , finding that it scales like ∼ N −1/2 to a very good approximation. The contour plots in the bottom-left panel of Fig. 2 show 90% confidence intervals for the 2D joint distribution of the two parameters, and they show that the parameters are almost completely uncorrelated. Our approach can accommodate an arbitrary number of modes. As a slightly more complex scenario, we still set D = 0 but we now consider the observation of the primary (l = m = 2) and secondary (l = m = 3) QNM for each BH (i.e., q = 2), thus doubling the number of parameters that we wish to constrain. Figure 3 shows the width σ of the sampled posteriors as a function of N . The smallest values of σ (i.e., the strongest bounds) correspond to the frequency corrections δw 2 ), which are typically harder to measure. Most importantly, Figure 3 shows that some of the Par-Spec parameters can become measurable by increasing the number of observations N : for example, for N = 1 the marginalized distribution of δt (0) 2 is flat within the allowed range of values, and hence unconstrained by the data, but this quantity can be constrained for larger values of N . As in the q = 1 case considered above, we find that the widths of all parameters is very well fitted by a ∼ N −1/2 scaling when N 100. Moreover, we do not find evidence for significant correlations between any of the δw We can turn to the more realistic scenario of spinning BHs. We truncate the expansion at D = 4, so that modes in GR are estimated with an accuracy better than 1% up to spins χ ∼ 0.6 (see Fig. 1   with q = 1 and q = 2 are shown in the top and bottom panels of Fig. 4, respectively. Consider first q = 1 (top panel). Even with a large number of detections N , only some of the parameters are measurable. In general we can constrain with good accuracy the first three δw (n) 1 's, i.e. GR deviations up to quadratic order in the spin (n = 0, 1, 2). By contrast, only the non-spinning correction δt (0) 1 to the damping times is bounded by the data.
Note however that δw

Modes
FIG. 5. Same as Fig. 4, but for massive BHs observed by LISA.
ParSpec parameters for q = 1. The spin-dependent frequency corrections show a correlation which generally decreases with N , while δt (0) 1 is typically uncorrelated with the other coefficients.
The q = 2 case (bottom panel of Fig. 4) is very similar: the width of the posteriors inferred through the MCMC decreases with N . The hierarchy among the beyond-Kerr parameters is also the same as in the single-mode case: zero-order (nonspinning) terms are best constrained, followed by corrections that are of low order in rotation. Remarkably, with N = 100 sources we can put tight upper bounds on the coefficient of the secondary mode, with |δw These results can be straightforwardly adapted to the second detection strategy outlined in Sec. II B. In this case we assume that for each observation we extract two QNMs from the postmerger GW signal, using the frequency and damping time of the fundamental mode to determine the mass and the spin of the source. This scenario is comparable to the single-mode case described above, but now we inject into the MCMC the subdom- inant QNM, which has lower SNR. Therefore we expect that the bounds shown in Fig. 3 and in the top panel of Fig. 4 would worsen by roughly one order of magnitude.

B. Projected constraints with LISA
We now perform a similar Bayesian analysis for LISA QNM observations of massive BH binary mergers. For nonrotating BHs (D = 0) the reconstructed posteriors are nearly a factor of ten tighter than the corresponding distributions for ET. This is somehow expected, since the parameters are nearly uncorrelated and the inference is dominated by the SNR of the detections (which we assumed to be one order of magnitude larger for LISA than for ET). Figure 5 shows the width of the posteriors for spinning BHs with spin corrections up to D = 4 and q = 1 (top panel) or q = 2 (bottom panel). The values of the upper bounds on the ParSpec parameters are in qualitative agreement with those obtained for ET in Fig. 4.
To facilitate comparisons, in Fig. 6 we show the posterior distributions inferred from a sample of N = 100 observations with ET and LISA, assuming q = 1. At least in the case of scale-free corrections considered in this work (p = 0), LISA constraints are more stringent. For the best constrained parameters, the 90% confidence intervals with LISA are |δw (0) 1 | 3.2 × 10 −4 (|δt (0) 1 | 1.3 × 10 −3 ), which are smaller than the corre-sponding values obtained for ET by a factor ∼ 6 (∼ 5). For the upper bounds on the spinning coefficients we get |δw (1) 1 | 1.2 × 10 −2 and |δw (2) 1 | 9.3 × 10 −2 which are ∼ 4 and ∼ 3 times smaller than those inferred by ET. The corner plot in Fig. 8 of Appendix B shows the complete set of marginalized and joint distributions derived for such parameters. As already discussed in Sec. III A, the coefficients δw (J) i that modify the mode's frequencies are all correlated to each other, while the correction to the damping time is almost decoupled from the other parameters.
The two-mode analysis (q = 2) follows the same trend. Moreover, comparing the bottom panels of Figs. 4 and 5 we note that -unlike ET -LISA will be able to constrain possible deviations from the primary and secondary modes with comparable accuracy.

IV. POSSIBLE EXTENSIONS
In this work we have presented a data-analysis framework (that we dub ParSpec ) and performed a preliminary analysis. Here we discuss several interesting extensions that should be explored in the future.
In our proof-of-principle data-analysis demonstration we consider only scale-free corrections, i.e. p = 0. The extension to different values of p (and hence to dimensionful couplings) is technically straightforward, but introducing a scale inevitably makes certain sources more relevant than others. Specifically, for a coupling parameter α with mass dimension p, sources with (source-frame) mass M s such that α/(M s ) p ∼ 0.1 will contribute the most, whereas sources with α/(M s ) p 1 will be irrelevant for the analysis. We could simply consider only the subset of events such that α/(M s ) p is larger than a fixed threshold. Overall, this would require more detections.
The assumption that our "true" signal is the standard ringdown within GR allows us to put at most upper bounds on the beyond-Kerr ringdown parameters, but we can search directly for GR deviations by writing the beyond-Kerr ringdown parameters explicitly for a given theory. As an extension, it would be interesting to consider a particular non-GR theory and to recover the ringdown signal in this theory with a standard GR ringdown template, in order to quantify systematic errors [99].
Additional "branches" of the QNM spectrum which are not perturbatively close to the Kerr spectrum are expected in virtually any extension of GR, although they might be excited with small amplitude (see e.g. [40, 62-64, 68, 72]). To leading order, the extra modes coincide with the corresponding QNMs of a Kerr BH in GR: for example, extra scalar (vector) degrees of freedom can give rise to standard scalar (vector) QNMs in the gravitational waveform, with amplitude proportional to the coupling parameter of the theory [40,63,68,72]. Our formalism can accommodate extra QNMs, which can be parametrized with Eqs. (3)-(4) with γ i = 0 by setting w (n) and t (n) to match the corresponding values for the (scalar, vector, etcetera) QNMs of a Kerr BH in GR (but see [43] for possible complications arising when different perturbations are coupled to each other).
Some theories of gravity may have multiple coupling constants, rather than the single perturbative parameter considered here. It is straightforward to extend our formalism to this case.
Our spin expansion is necessary to parametrize the ringdown in terms of a set of constant coefficients (as opposed to functions of the spin). The resulting systematic errors can be reduced by considering higher-order expansions than the D = 4 case considered here. To check the impact of the truncation order, we have also considered a spin expansion truncated at D = 6. In our tests, the posterior distribution of the measurable parameters did not change significantly relative to the D = 4 case.
An obvious and important extension of our work is to compute rates for both 3G and LISA sources using more realistic astrophysical models. In this preliminary analysis we have assumed 10 − 100 events at SNR ∼ 100 (∼ 1000) for ET (LISA), corresponding to nearby sources. It is important to estimate whether these estimates are realistic. Since we select only large-SNR sources, which are generally the closest ones, we have neglected the source redshift. Even the closest LISA ringdown sources may have nonnegligible redshift, and therefore cosmological effects should be included in a more refined analysis.
We assume that two different angular modes are detected for each sources. The extension to multiple angular modes is straightforward, and in general it should lead to stronger constraints. Likewise, in realistic scenarios not only the amplitude of the secondary mode, but also its nature will depend on the binary parameters (mass ratio and spins): in general the second most exited mode will correspond to l = m = 3 only for a subset of sources, whereas the mode with l = 2, m = 1 and l = m = 4 may be dominant for others. Future work should extend our parameter estimation strategy to the case of multiple, source-dependent secondary modes.
For any given (l, m) we consider only the fundamental mode, neglecting the overtones. In general, overtones are relevant for parameter estimation [24,34,35,[78][79][80]. However, the frequencies of different overtones are very closely spaced and hard to resolve [4,36], and therefore -besides consistency tests with mass/spin inferred from the whole waveform [34,35] -it is hard to use them for direct BH spectroscopy. If the ringdown SNR is very high (as expected for 3G detectors and LISA) [36] the overtones may be resolved, and therefore they should be included in our model.

V. CONCLUSIONS
Ringdown tests and BH spectroscopy will allow us to place much tighter constraints on strong-field gravity when high-SNR BH merger detections will become routine, as expected for LISA and 3G interferometers. We have introduced an approach based on "Parametrized Ringdown Spin Expansion Coefficients" (ParSpec) to parametrize beyond-GR deviations from the standard QNM ringdown of a Kerr BH in Einstein's theory. We demonstrated that this method can be used to constrain a large number of beyond-Kerr ringdown parameters using multiple ringdown observations.
Our main results can be summarized as follows: • At variance with previous frameworks (e.g., [22]), in ParSpec the ringdown parameters can be mapped to virtually any (perturbative) extension to GR. The framework is perturbative in the spin but can be made arbitrarily precise -at least in principlethrough high-order spin expansions.
• We estimate that for a spin expansion of order six or higher (D ≥ 6), truncation errors are below 1% for spins χ 0.7 (see Fig. 1).
• The number of beyond-GR parameters can be very large (especially in the case of a high-order spin expansions), but we can use Bayesian inference to identify the most easily measurable expansion coefficients. It turns out that O(10) ringdown detections at SNR ∼ 100 (as achievable with ET and Cosmic Explorer) can constrain the beyond-Kerr parameters associated to zeroth-and firstorder corrections in the spin, whereas constraining the second-order in spin coefficients will require O(10) ringdown detections at SNR ∼ 1000, something that could be achievable with LISA.
• An important consequence of this observation is that, even including beyond-Kerr parameters up to D = 4 in the spin, only those with D ≤ 2 can be actually measured in the foreseeable future.
• The method can automatically accommodate an arbitrary number of sources. As expected, the posterior distribution becomes narrower as the number of events N increases. Their width scales approximately as σ ∼ N −1/2 when N 100 (the accuracy of this scaling improves when the number of parameters is not too large). Interestingly, the number of beyond-Kerr parameters that can be measured increases with N . Furthermore, as the number of sources increases it could be possible to perform multiple and independent checks.
Future work will focus on a more systematic analysis of ParSpec along the lines discussed in Sec. IV. In this appendix we show that -for theories which are perturbatively close to GR -the QNM frequencies and damping times are given by Eqs. (3) and (4), where M i and χ i are the mass (in the detector frame) and spin of the i-th source, both measured assuming GR.
In general, the QNM parametrization can be written similarly to Eqs.