Continuous time‐interaction processes for population size estimation, with an application to drug dealing in Italy

We introduce a time‐interaction point process where the occurrence of an event can increase (self‐excitement) or reduce (self‐correction) the probability of future events. Self‐excitement and self‐correction are allowed to be triggered by the same event, at different timescales; other effects such as those of covariates, unobserved heterogeneity, and temporal dependence are also allowed in the model. We focus on capture‐recapture data, as our work is motivated by an original example about the estimation of the total number of drug dealers in Italy. To do so, we derive a conditional likelihood formulation where only subjects with at least one capture are involved in the inference process. The result is a novel and flexible continuous‐time population size estimator. A simulation study and the analysis of our motivating example illustrate the validity of our approach in several scenarios.


INTRODUCTION
Models for capture-recapture data are commonly used to estimate the size of a population that is difficult to sample without bias. Such models find a wide range of applications (Otis et al., 1978;Chao et al., 2001;Silverman, 2020). In this work, we focus on cases in which sampling occurs in continuous time, the and occurrence of sampling can trigger a complex behavioral response that might affect the future sampling risk. Continuoustime models for population size estimation are well established (Chao and Lee, 1993;Yip et al., 1996;Lin and Yip, 1999;Hwang and Chao, 2002;Xi et al., 2007;Farcomeni and Scacciatelli, 2013;Matechou and Caron, 2017;Liu et al., 2018;Schofield et al., 2018) and often involve modeling of the temporal point process (Illian This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2022 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society. Moller et al., 1998) of capture/identification/ sighting. Our motivating application involves assessing the extent of illegal drug trafficking in Italy. In this work, we consider an original dataset about drug dealers identified in Italy between 2005 and2006. This maps to a continuoustime experiment, with dealers continuously at risk of being observed by Italian police officers from time zero to the end of the available records. Clearly, not all drug dealers have been identified during the observation period. More importantly, we can assume that the occurrence of an interaction with the Italian police brings about a transient and complex behavioral response. This will be numerically verified on the data at hand, and ignoring this effect might lead to biased estimates. For other examples of innovative population size estimation methods to assess the extent of drug abuse, see Overstall et al. (2014), Farcomeni and Scacciatelli (2013), Huggins et al. (2016), Hay and Richardson (2016), and references therein. Our final estimate for the number of drug dealers can be of interest to both law and public health authorities. The latter are often mainly interested in the number of illicit drug users for monitoring, planning, and policy; but indeed many official estimates of the number of illicit drug users are based on dealer/consumer ratios (e.g., Bouchard and Tremblay, 2005), which must rely on accurate estimates of the number of dealers.
In the temporal point process framework, behavioral effects correspond to conditional dependence of event rates on times of previous events. This can result in selfexciting (self-correcting) processes, where the occurrence of an event at a time point affects the probability of occurrence of a later event in an upwards (downwards) fashion. Self-exciting processes are also known as Hawkes processes (Hawkes, 1971). Applications of Hawkes processes span environmental and seismological data (Ogata, 1999), health and genomic studies (Reynaud-Bouret and Schbath, 2010), crime (Mohler et al., 2011), and other areas. Self-correcting processes (Isham and Westcott, 1979;Vere-Jones and Ogata, 1984) have been applied mostly to earthquake data (Rotondi and Varini, 2019;Schoenberg and Bolt, 2000).
Our methodological contribution in this work is twofold. First of all, we contribute to the literature on temporal point processes by defining a combination of self-exciting and self-correcting point processes, allowing each event to increase and/or decrease the likelihood of future ones. A similar proposal can be found in Bonnet et al. (2021), but we believe our specification to be more appropriate for the context of population size estimation. Hawkes and self-correcting processes may be derived as special cases. We believe our general model to be of general interest. Second, we show how self-exciting and self-correcting phenomena can be used to specify flexible behavioral responses in continuous-time capturerecapture models. To do so, we optimize the conditional likelihood, as only subjects with at least one capture are observed. The risk of capture of each individual can be thought to depend on a combination of four components: (i) observed heterogeneity (i.e., covariates, producing a class of models known as models), (ii) unobserved heterogeneity (i.e., unmeasured covariates, ℎ models), (iii) time effects that are independent of the subject ( models), and (iv) behavioral response to capture ( models), where the occurrence of being observed can temporarily or permanently change the risk of capture. A combination of the above features is possible, and the most general model is denoted as ℎ (Otis et al., 1978;Farcomeni and Scacciatelli, 2013;Farcomeni, 2016). Therefore, our contribution to the continuous-time capture-recapture literature is that of a flexible and parsimonious behavioral effect, which has direct interpretation and applicability for both human and animal populations, embedded in a unified framework that can also take into account all other sources of heterogeneity.
The rest of the paper is as follows: in the next section, we introduce our novel time-interaction process, define the observed and the conditional likelihood. This corresponds to model in the population size estimation framework. In Section 3, we specify a general model for the ℎ case. The performance of the new approach is assessed via a simulation study and successfully compared to some existing methods for population size estimation in Section 4. The analysis of our motivating example is reported in Section 5. Concluding remarks are presented in Section 6.
The methods presented in this work have been implemented in R code, which may be found online in the Supporting Information. The data used for the present work must be organized in a list where each element identifies a subject captured at least once; for each subject, a vector of capture times is provided. Optional subject-specific covariates may be included in a separate covariate matrix.

TIME-INTERACTION PROCESSES
A continuous-time point process is a stochastic process modeling a temporal point pattern { 1 , … , }; that is, a sequence of times of occurrence for an event of interest. The corresponding counting process ( ) records the number of events in the interval (0, ), taking values in . The number of points over a time interval depends on interval length but also on an intensity function that may take a complex form. Under reasonable assumptions, this leads to ( ) ∼ (Λ( )), where Λ( ) = ∫ 0 ( ) and the conditional intensity function ( ) satisfies ( ) = lim ℎ→0 [ ( +ℎ)− ( )|( )] ℎ , with ( ) being the history of event times up to . For capture-recapture data in continuous time, ( ) represents the number of captures for one generic subject in (0, ). Conditional intensities may or may not be constant across individuals. In a self-exciting process, the occurrence of an event at time causes the conditional intensity function to increase in the short term. In capture-recapture contexts, this generates the so-called trap-happiness effect. The intensity is modeled as (Hawkes, 1971) for some smooth self-exciting kernel (⋅). A standard choice for (⋅) is the exponential kernel ( ) = ∑ =1 − , for some ≥ 1. A stationarity condition for the exponential kernel is that ∑ ∕ < 1, to avoid explosion of the intensity function. In practice, the exponential kernel with = 1 is often flexible enough to provide a good fit (Oakes, 1975). Model (1) is not suitable for taking trap-shyness into account, that is, for considering situations where the conditional intensity function drops after an event. This can be described by self-correcting kernels, where the intensity increases in absence of events and is decreased by each event. The conditional intensity of a self-correcting process may be expressed as (Isham and Westcott, 1979) where is a parameter controlling the degree of trapshyness and is a target for the number of events at time .
We here propose a unified time-interaction process, where In (3), , , , are positive; and < . It is worth noticing that we are specifying an additive hazard counting process model (Aalen, 1989;Lin and Ying, 1995).

Inference
Suppose we have observed events for each subject = 1, … , , with subject-specific capture times { 1 , … , , … , }, over an interval (0, ). The loglikelihood for the th subject is where = ( ), with 0 = 0 and ( +1) = . Under independence, the observed log-likelihood corresponds to the sum of (4) over subjects For our time-interaction process, the subject-specific loglikelihood becomes where = ( , , , ) is a short-hand notation for the parameters involved. In order to maximize the conditional log-likelihood, the gradient is needed. This can be derived in closed form, as reported in the Web Appendix. The Hessian can be obtained through a numerical first derivative of the gradient. A Newton-Raphson method proceeds after operating a change of variables so that positive parameters in = ( , , , ) are replaced by real-valued parameters in ′ = ( ′ , ′ , ′ , ′ ), where = exp( ′ ), = exp( ′ ) + exp( ′ ), = exp( ′ ) and = exp( ′ ). The value of the Hessian at convergence can be used to obtain standard errors of the parameters as usual. Further details on maximum likelihood estimation are given in the next section.

Conditional likelihood and population size estimation
In the capture-recapture framework, we can only observe subjects with > 0. We shall resort to the conditional likelihood (Sanathanan, 1972), which conditions on the event that there is at least one capture. Since ( ) is distributed like a Poisson, the probability of such an event is Pr( > 0) = Pr( > 0) = 1 − exp{− ∫ 0 exp( ) } = 1 − exp{− exp( )−1 }. Consequently, the conditional log-likelihood is given by * ( ) = ( ) − log This corresponds to an specification, as commented in the previous section. The closed-form expression for the gradient of (7) is given in the Web Appendix and can be used to set up optimization as described for the observed likelihood (5).
The final goal in a capture-recapture context is to provide an estimate for the total number of subjects . In our implementation, we compute the Horvitz-Thompson estimator, whose expression iŝ Details about the standard error ofˆare given in Section 3.2.

INCLUDING TIME EFFECTS, OBSERVED HETEROGENEITY, AND UNOBSERVED HETEROGENEITY
The model outlined previously only takes into account behavioral effects, which might lead to clustering (selfexcitement) or spreading (self-correction) of events over time. A general model must consider three further sources of variation in capture rates: time-dependent effects, observed heterogeneity, and unobserved heterogeneity. To this end, we propose the following formulation for the intensity function: where is a × 1 vector of subject-specific, time constant covariates, and is a vector of regression coefficients capturing observed heterogeneity ( part). Function 0 ( ) is a parametric baseline hazard, with Λ 0 ( ) = ∫ 0 0 ( ) . In the following, we set 0 ( ) = −1 , which is the baseline hazard for the Weibull distribution. Different baselines can in principle be specified and compared through likelihood-ratio tests (e.g., Fine, 2002; see also our simulation results). Exogenous time dynamics are captured by the parameter , which tunes the target number of events for the self-correcting part ( ). Finally, unobserved heterogeneity ( ℎ ) involves the use of random subject-specific effects. In the present specification, we include a frailty term , representing unobserved factors that might modify the individual risk of captures and make both behavioral effects subject specific, allowing, for example, part of the population to be trap-shy and part of it to be trap-happy in the short term. Several constraints can be used to obtain any possible submodel (e.g., by assuming = 0 and/or = 1). It is also possible to exclude self-exciting ( = 0) or self-correcting ( = 0) behaviors. The model is completed by the specification of a distribution for subject-specific effects. In this work, we use latent class models (see, e.g., Coull and Agresti, 1999). We assume the existence of a discrete latent variable , with Pr( = ) = for = 1, … , . Parameters , , , and take values in a discrete set of elements as indicized by . Therefore, ∈˜= (˜1, … ,˜), ∈˜= (˜1, … ,˜), ∈˜= (˜1, … ,˜), and ∈˜= (˜1, … ,˜). We assume vector˜to be in increasing order. This constraint, in addition to < and , , , > 0 specified in Section 2, ensures the identifiability of the extended model.

Estimation of model parameters
The time-interaction ℎ model (9) has 5 + parameters: 4 + + 1 elements in = (˜,˜, ,˜,˜, ), plus − 1 latent class probabilities . The log-likelihood for subject conditional on a generic class is The observed likelihood is therefore ∏ ∑ exp( ( | )). The latter is slightly cumbersome to compute and might lead to numerical issues during optimization. For this reason, we prefer setting up an expectationmaximization algorithm, which is based on the complete log-likelihood is the indicator function. The latter is the likelihood one would observe in case the discrete latent variables were measured. The complete conditional log-likelihood corresponds to ∑ The E step consists of computing the posterior expected complete log-likelihood values, given observed time patterns and the current parameter values. Since the complete likelihoods are linear in ( = ), the E step involves simply replacing the indicators witĥ and an analogous expression is obtained for the conditional log-likelihood. At the M step, a closed-form expression is obtained for updating probability masses aŝ For all other parameters, we use a numerical optimization algorithm at each iteration. A deterministic starting solution is obtained by performing cluster analysis of the time patterns to assign initial latent classes, and then estimating our model with = 1 on each subgroup. Additional starting solutions are obtained through random perturbations of the deterministic solution.

Estimation of the population size
In order to estimate the population size with our general ℎ model (or submodels), we use a Horvitz-Thompson type estimator. This would involve a plug-in estimate of (11), but in our experience the resulting estimates are sometimes unstable (see also Farcomeni and Scacciatelli, 2013, on this point). For this reason, we use the maximum-Aposteriori strategy, which we will now describe.
First, we use posterior probabilities (12) at convergence to assign each subject to its most likely latent class, denoted as * . Then, we computê . (14) In our experience (both on simulated and real data), such a procedure leads to stable population size estimates for sampling fractions as small as 5%. There are additional methods that can be used, if necessary, to improve the stability of population size estimates, such as trimming or truncating large weights. For a complete survey, refer to Chen et al. (2017).
We now focus on the standard error of (14). As noted in Van Der Heihden et al. (2003) and Böhning (2008), acknowledging that extra variability is brought about by the random sample size. It is straightforward to check (Farcomeni and Scacciatelli, 2013) that in our case one can expressˆ= (ˆ) as a function of model parameters and use the Delta method to estimate E[Var [ˆ| ]] as ∇ (ˆ) ′ (ˆ)∇ (ˆ), where ∇ (ˆ) denotes the gradient ofˆas a function ofˆ, and (ˆ) is the observed Fisher information. The latter is a direct by-product of the model estimation procedure, while the former has been in part derived in closed form and is reported in the Web Appendix. All other cases can be easily derived from the expressions reported as the Supporting Information in the Web Appendix. Finally, the second addend of (15) can be approximated as (see also Section 2.1 of Böhning, 2008;Van Der Heihden et al., 2003;Farcomeni and Scacciatelli, 2013

SIMULATION STUDY
In this section, we report a simulation study which assesses the comparative performance of our time-interaction (TI) approach over different scenarios. We compare our estimator, in terms of root mean square error (RMSE) for the population size, to a selection of literature proposals: the and ℎ models available in the R package ctime (Schofield et al., 2018), Chao's lower bound estimator ( ℎ) (Chao, 1987), the Poisson model with unobserved heterogeneity ( ) (Rivest and Baillargeon, 2007), the Generalized Chao estimator ( ), which includes covariates and a behavioral effect (Böhning et al., 2013;Farcomeni, 2018), and an additive version of the continuous-time model by Lin and Yip (1999) ( ). The latter corresponds to an ℎ model with Weibull baseline hazard and a permanent behavioral effect and is a variation of the model by Hwang and Chao (2002). For our approach, we evaluate the most general time interaction process ( ℎ ) together with a model that does not include behavioral effects ( ℎ ). A self-exciting-only and a self-correcting-only version of the process have also been tested and are not reported here, as they have a similar performance to the other methods and are outperformed by the complete model in all cases.
We then generate different scenarios through combinations of 2 ∈ {1, 2}, 2 ∈ {0.5, 2}, 2 ∈ {0, 1}, and ∈ {1.2, 1.3, 1.4, 1.5}. Parameters are calibrated to lead to different sampling ratios and degrees of heterogeneity. The sampling ratios range from 5% to nearly 90% of the population size, which also supports the stability of the estimates in the application in Section 5. In addition, we generated data under three scenarios which are special cases of a general ℎ process: no self-exciting effect ( 1 = 2 = 0), no self-correcting effect ( 1 = 2 = 0), and no time effect ( = 1), to show the performance of our approach even when data are generated under a simpler model. We let ∈ {500, 1000, 5000}, leading to a hundred and five simulation settings. Additionally, we test the first four parameter combinations for = 10, 000 for a complete evaluation of the estimator CI. For each parameter combination, we generate 1000 datasets, over which we average results to estimate the RMSE for each population size estimator.
Tables 1 and 2 report the simulation settings, the sampling ratios, and the resulting RMSEs, with the best performing estimator highlighted in bold for each scenario. It can be seen that our proposal achieves the smallest RMSE in almost all scenarios, with the remaining being a close call between ℎ and model ℎ . The ℎ seems to find a competitive alternative only when the contribution of the time effect is at its strongest, and the population size is small. The submodel without behavioral effects ℎ has a worse performance, which is similar to the and models. For = 5000, the model performs better but never becomes a close competitor of our proposal. In general, as increases from 500 to 1000 and 5000, the gap with other estimators increases. In Table 2, we also check sensitivity to the specification of 0 ( ) by generating data with a baseline step function with five jumps equally spaced between 0 and , of sizes 0.067, 0.133, 0.200, 0.267, and 0.333, respectively. Our model is still fitted using a Weibull baseline. It can be seen that, at least with this setting and over a variety of simulation scenarios, there is little sensitivity to the misspecification of the baseline rate; our approach still shows the best performance.
The good theoretical properties of our approach are also testified by the fact that the RMSE decreases with the sample size at the expected rate; the same was successfully checked for the model parameters. Additionally, for the first scenarios we have evaluated the coverage of 95% CI (CIs) after computing the standard error ofˆℎ as described in the previous section. We obtain an empirical coverage of 0.999 for = 500, 0.989 for = 1000, 0.976 for = 5000, 0.963 for = 10, 000, and 0.979 for = 500 and the misspecified baseline.

ESTIMATION OF THE NUMBER OF DRUG DEALERS IN ITALY
We consider a dataset of drug dealers sanctioned under the Italian law 309/90 (Bossi-Fini law). We have data about all arrests by any of the Italian police forces, and we fix time zero as the publishing of the law in "Gazzetta Uffi-ciale," which is the moment in which new laws enter into force in Italy. Our main target for analysis is the estimation of the number of drug dealers in Italy at the time. A secondary aim is assessing the features of the observation process, which can give information about what modulates the risks of identification of drug dealers by police forces. Our analysis is based on few rather strong assumptions. First of all, our target population is made of all dealers that were not in jail at time zero. We assume that no new dealer entered or exited the drug market during the observation period, which clearly cannot be precisely true. Additionally, we assume all drug dealers continued their activity after being identified for the observation period (even after they served a short time in jail, in case this happened). We restrict the dataset to 2 years to make these assumptions more credible. Our choice for the time span is comparable to that of previous studies (e.g., Chiang et al., 2007;Bouchard and Tremblay, 2005). It shall be further added that our estimates are fairly stable with respect to this choice, due to the fact that many dealers were identified shortly after the onset of the new law. Clearly, the standard error of our estimates is inversely associated with the length of the observation period.
The original dataset is provided as a matrix, where each row describes a single capture; for the considered capture, some subject-specific information is reported (sex, age) together with the capture date. We reorganize the dataset in the format of an R list, where each element contains capture times for each subject. Sex and age are organized in a separate covariate matrix with the number of rows is equal to the number of elements of the list (one for each subject). The number of observed subjects identified at least once is = 4271; 3831 dealers were males, while only 440 were females. Figure 1 summarizes some preliminary information about some covariate age and capture times, in the form of histograms. The median age was 32 (mean 33), with minimum 14, first quartile 26, third 39 and maximum 82. The great majority of subjects, that is 4170, are captured only once, 99 dealers are captured twice, and two are stopped three times.
The recapture pattern may be a consequence of a combination of factors: the probability of meeting the same dealer within 2 years is low, especially in big cities. Moreover, sanctions and legal consequences become more severe after the first capture; therefore, subjects are expected to be trap-shy. The counting distribution shows therefore one inflation (Farcomeni and Scacciatelli, 2013;Godwin and Bohning, 2017;Bohning and Van Der Heijden, 2019;Bohning and Friedl, 2021). This is not a particularly problematic issue with our continuous-time approach, which takes into account the time interval between time zero and the event. The total number of drug dealers in Italy for the considered time interval is estimated TA B L E 1 Simulation study, part 1: RMSE of estimators of models (ˆ) and ℎ (ˆℎ ) (in ctime library), Chao estimator (ˆℎ), Poisson estimator (ˆ), generalized Chao estimator (ˆ), additive hazard Lin and Yip's estimator (ˆ) compared to our proposal with the complete model (ˆℎ ), and a submodel without behavioral effects (ˆℎ ) Note: For each parameter setting, we report the sampling ratio ( ∕ /coverage) and the root mean square error (RMSE) for each estimator averaged over 1000 replicates. In bold, the smallest RMSE for each scenario.
F I G U R E 1 Drug dealers data: left, histogram of age; middle, day of the first capture; right, days between captures TA B L E 2 Simulation study, part 2: RMSE of estimators of models (ˆ) and ℎ (ˆℎ ) (in ctime library), Chao estimator (ˆℎ), Poisson estimator (ˆ), Generalized Chao estimator (ˆ), additive hazard Lin and Yip's estimator (ˆ) compared to our proposal with the complete model (ˆℎ ) and a submodel without behavioral effects (ˆℎ ) with our approach; we compare our estimator with Chao, generalized Chao, , and ℎ models. In Table 3, we report the estimated population sizeˆ, the corresponding 95% CI and Akaike information criterion (AIC). We only show the results for a selection of the models we have estimated, omitting several other ones which had a clearly worse performance in terms of AIC. For Chao's estimator, we report AIC computed through its likelihood-based representation (Böhning et al., 2013;Farcomeni, 2018;Dotto and Farcomeni, 2018). In Table 3 letters ℎ, , , and stand for unobserved heterogeneity, covariates, time effects, and behavioral effects, respectively. Numbers refer to the number of latent classes for models with unobserved heterogeneity (see Equation 10). The notation ℎ 2 refers to a constrained model, which includes unobserved heterogeneity based on = 2 latent classes, time, and selfcorrecting behavioral effects; but with fixed = (0, 0), = (1, 1), and 1 = 0.
Covariates do not seem to be important (it shall be remarked that they are not also considering them separately), which is a mild evidence that there is little age and gender profiling. The gender effect might also be lacking due to the very limited number of female drug dealers, making it difficult to detect the effect. For the data at hand models give fairly stable estimates of population size regardless of the formulation. The best model in terms of AIC is ℎ 2 , which has no self-exciting effect and shows a mild improvement over the unconstrained  Table 4. One of the latent classes (aboutˆ1 = 39.5% of the dealers) shows no selfcorrecting effect, and hence no behavioral effects overall. This provides evidence that one subgroup of dealers does not modify its behavior after the first capture. The other latent class, regardingˆ2 = 60, 5% of the dealers, shows a self-correcting effect, according to which dealers have a decreased risk of being stopped after the first capture. This trap-shyness effect is reasonable in the context of criminal offenses, where many dealers become more cautious after being caught by the police. In Table 4, one can see that the behavioral parameter 2 is small but relevant, being linked to the scale of the data. It can be concluded that the correcting effect slowly decreases over time. The fitted intensity is shown in Figure 2. For the first latent class (left panel), with no behavioral effects, the intensity only depends on and 1 and smoothly decreases over time irrespective of the captures. In the second latent class, we also have 2 and the number of captures enters the computations. The right panel of the figure shows three fitted intensities, partitioning subjects by the number of captures; for each group, the mean capture times are taken to show the jumps in the intensity function representing the trap-shyness effect.
Other models used for comparison, with the exception of ℎ which might be possibly overestimating the population size, give similar estimates. On the other hand, Chao's and estimators have larger CI and can be thus  . The data can be expected not to be consistent with other configurations: for instance, in order to claim traphappiness for the second latent class, one would have to obtain a very low common hazard early on, which under the Weibull model might not be consistent with a high later hazard.

CONCLUSIONS
We have proposed a general framework for estimating the size of closed populations with continuous-time capture-recapture experiments. A number of proposals are present in the recent literature regarding behavioral models for capture-recapture data in discrete time (e.g., Yang and Chao, 2005;Farcomeni, 2011;Alunni Fegatelli and Tardella, 2016). The case of continuous-time remains as yet relatively unexplored. The proposed time-interaction processes are arguably useful for modeling complex behavioral effects in continuous-time population size estimation. Time-interaction processes embed both self-exciting and self-correcting processes and are of independent interest in our opinion. Our model for population size estimation is based on a conditional likelihood formulation for time-interaction processes, in which units with no events are never observed. It allows for covariates, unobserved sources of heterogeneity, and time effects; and it is particularly useful in situations where a complex behavioral effect may be expected, as in our motivating application. Our model compares to Hwang and Chao (2002), which can also take into account time effects, behavioral response, and individual heterogeneity. There are two main differences: unlike Hwang and Chao (2002), and more in the spirit of Lin and Yip (1999), we specify a parametric baseline hazard. On the other hand, in Hwang and Chao (2002) a classical constant and permanent effect are specified for the behavioral response, which is a special case of our timeinteracting framework. For the data at hand regarding the number of drug dealers in Italy over the years 2005 and 2006, our best population size estimate is equal to 91,368 dealers. Over a population of 58.14 million (Italy official data 2006), this equals 1.5 dealers for each 1000 Italians. We believe this estimate to be credible when we consider the definition of a drug dealer according to the Italian law at the time of the study. In addition, our conclusions compare well with Bouchard and Tremblay (2005), whose estimates range from 1 to 2.7 for each 1000 Quebec inhabitants in 1998. We find that about 60% of the dealers underwent a selfcorrecting behavioral response according to which they were slightly less likely to be stopped again by police, shortly after the event, all else fixed. It could be argued that the very limited number of recaptured individuals in the real data example could make it difficult to estimate behavioral effects. It shall be kept in mind, though, that the model takes into account the time to first capture, the time between captures (if any), and the time between the last capture and the end of follow-up. Since subjects are continuously at risk of being captured, a subject that is observed only once but shortly after time zero is different from a subject that is observed only once but shortly before the end of follow-up. The latter is unlikely to have the possibility to show a behavioral response and will mostly contribute to the baseline hazard. The former can be deemed to be trap-shy, trap-happy, or not to have a behavioral response, depending on what is consistent with the common baseline hazard. There is no risk of confounding with the baseline hazard, since it has a parametric form and it is common to all subjects (up to equal covariate profile).
A limitation of the current work, as common in this literature, is that we must assume that the population is closed. This has led us to restrict the observation period and discard valuable information about dealers stopped under the 309/90 law after 2006. Moreover, due to obvious privacy reasons we only have limited information on the captured subjects and none about the officers. In addition, further work might be devoted to the derivation of profile CI, joint population size, and total estimation (Farcomeni, 2022), or how to exploit information about the location of identifications (e.g., Borchers and Fewster, 2016;Stevenson et al., 2021).

A C K N O W L E D G M E N T S
This work has been developed within the framework of the European Union (EU) project JUST/2010/DPIP/AG/1410: "New methodological tools for policy and programme evaluation" with the financial support of the Prevention and Information Programme of the European Commission. The contents of this publication are the sole responsibility of the authors and can in no way be taken to reflect the views of the European Commission. The authors are grateful for constructive comments to the editor, an assistant editor, and three referees.
Open Access Funding provided by Universita degli Studi di Bologna within the CRUI-CARE Agreement [Correction added on May 19, after first online publication: CRUI-CARE funding statement has been added.]

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this paper are not publicly available due to privacy restrictions.