Fuzzy c -ordered medoids clustering for interval-valued data

Fuzzy clustering for interval-valued data helps us to ﬁ nd natural vague boundaries in such data. The Fuzzy c -Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a partitioning around medoids approach. However, one of the greatest disadvantages of this method is its sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering method named Fuzzy c -Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c -medoids method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided.


Introduction
In many real applications, empirical information represented by data cannot be expressed in terms of single valued data but in form of interval-valued data. We can distinguish the following situations: The phenomena cannot be explained by using single-valued data, i.e. the data are intrinsically interval-valued. Examples are the concepts of monthly temperature in meteorological stations or the daily rate of exchange between euro and dollar or euro and sterling. For example, for daily temperatures or daily pollution levels registered in different places or for mineral concentrations of food items, it could be more interesting to consider the minimum and maximum values registered than the average ones because they offer more detailed information about the examined phenomenon taking into account the variability of the features involved. Data in which each observation is given as an interval of values-indicated by a minimum and a maximum-are called interval-valued data. Notice that the intervals need not pertain to the actually observed maxima and minima, but could also pertain to, for instance, interquartile intervals, or intervals pertaining to the middle 90% of the scores [44].
Interval data may occur due to a lack of knowledge, that is when the true value of a variable is unknown and only an interval of values including the true value is available. Thus, the available information is imprecise and therefore cannot be revealed exactly by single numerical data.
Interval-valued data may also arise as a result of aggregating huge databases, which are impossible to analyze in the original form.
In all these cases, standard statistical methods for single-valued data (numerical data) are unable to properly take into account the nature of the empirical information formalized in an interval manner (interval-valued data).
In the literature, there are many examples of real applications with interval-valued data. For instance, "researchers often have only interval data on variables that can, in principle, be measured more precisely [i.e.] the interval data on wealth in the Health and Retirement Study (HRS) provide a ready illustration [61]. Let v denote a person's wealth. Under the HRS questionnaire protocol, a respondent is asked to report v. If he does not comply, the respondent is then asked to report if wealth falls within a sequence of brackets. The HRS thus yields a wealth interval ½v 0 ; v 1 for each respondent. The interval is degenerate when a respondent provides a point value of wealth, is an informative interval of positive width when the respondent answers the subsequent Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr bracket questions, and is the uninformative interval ½À1; þ1 otherwise" [76].
In database protection and security intelligence, for protecting confidential data, e.g. health information of different patients, it is necessary that statistical programs do not have full access to the data because otherwise, by computing sufficiently many different statistics, it is possible to uniquely reconstruct the actual values; in these cases for preventing this form of happening is useful to supply the data processing programs not with exact data, but with an interval of possible values; in environmetrics, we may refer to the minimum and maximum values registered in the considered period of pollutant concentration data recorded at various places; in finance, we may examine the minimum-maximum daily rate of exchange data between euro-dollar or euro-sterling; in medicine, we may make reference to minimum and maximum values of the daily systolic and diastolic pressure, pulse rate, temperature of patients; in meteorology, we can consider the minimum-maximum daily temperature, humidity and wind speed registered in different places; in chemometrics, we may study the range of mineral concentrations of food products analyzed at different times or in different experimental situations and so on.
Recently a great attention is focused on fuzzy clustering of interval-valued data (see, e.g., [35,36,14,21,33]). In this regard, an interesting line of research is that of the managing of fuzzy clustering in the presence of outlier interval-valued data, i.e. robust fuzzy clustering [35,33].
According to Huber [56], a robust method should have the following properties: it should have a reasonably good accuracy at the assumed model; small deviations from the model assumptions should impair the performance only by a small amount; larger deviations from the model assumptions should not cause a catastrophe.
In the literature, in a fuzzy clustering framework for single valued data, different robust approaches can be considered, i.e.: Noise approach: outliers are assigned to the so-called noise cluster (see, e.g., [11][12][13]82]).
Semifuzzy approach: the problem of outliers is avoided by setting a maximum number of clusters that a datum can belong to, setting membership degrees to zero if a predefined maximal distance is exceeded, or by defining a minimum threshold for membership degrees [84].
Metric approach: metrics with robust properties are incorporated in the objective functions of the clustering problem; e.g., L p norm (0o p o 1) [53], L 1 norm [60,65,72], exponential metric [92]. In the metric approach, we can consider the sub-approach ordered statistics approach. In this case the concept of weight functions in robust statistics is related to the concept of membership functions in fuzzy set theory or to possibility distributions in possibility theory. Different types of robust methods (such as the M, R, L estimators, and the least median of squares method) can be adopted [56]. Frigui and Krishnapuram [38] proposed a robust fuzzy clustering algorithm, termed robust C-prototypes algorithm, based on a generalization of the M-estimator to estimate the C prototypes and on the application of a loss function ρ to squared distances to reduce the effect of outliers. Davé and Krishnapuram [12] discussed the M-estimator and the connection between robust statistics (i.e. the M-estimator) and fuzzy clustering. The M-estimator uses a suitable symmetric positive-definite function (called the robust-loss function) and forms the objective function by summing the loss over all points. Winkler et al. [91] proposed a robust fuzzy clustering with polynomial fuzzifier function in connection with M-estimators in which for each data point its distances to the prototypes are ordered. In particular, Leski [72] proposed a robust fuzzy clustering method using the Vapnik's ε -insensitive estimator in which the fuzzy c-medians can be obtained as a particular case of the suggested method. In order to improve the robustness of the fuzzy clustering, Leski [74] used together the Huber's M-estimators and the Yager's OWA (Ordered Weighted Averaging) operators [93]. In particular, Leski [74] combined the fuzzy c-means clustering with the robust ordered statistics using Huber's M-estimator to develop a new fuzzy clustering method called as Fuzzy C-Ordered-Means clustering.
Possibilistic approach: following the Possibilistic Theory membership degrees are expressed as degrees of compatibility (possibility degrees); in this approach, to avoid the problem of outliers, outliers are included in all clusters with small membership (see, e.g., [68,69,87,94,83]).
Evidential approach: new clustering techniques based on the concept of belief functions [77].
Influence weighting approach: a suitable weighting system for objects is used in the clustering process and low weights are objectively assigned to outliers [64,29].
In a non-single valued data analysis framework-i.e. fuzzy clustering for imprecise data (fuzzy and interval data)-only some of the previous methodological approaches have been explored. In particular, Butkiewicz [4] proposed a transformation of fuzzy data into crisp data and uses an ordered statistics approach-based classical robust fuzzy clustering model for single valued data. Hung and Yang [58] suggested a metric approach-based robust fuzzy clustering for univariate fuzzy data by considering an exponential distance. Zarandi et al. [96], suggested an influence weighting approach-based robust fuzzy clustering model for fuzzy data using a Wasserstein-type distance. Coppi et al. [10] proposed a possibilistic approach-based robust clustering method for multivariate fuzzy data. D'Urso and De Giovanni [32] proposed robust fuzzy clustering methods for fuzzy data based on different methodological approaches, i.e. based on noise, metric and trimmed approaches. For interval-valued data according to our knowledge only two robust fuzzy clustering methods have been proposed in the literature, i.e. the noise approach-based fuzzy clustering proposed by D'Urso and Giordani [35] and the trimmed approach-based fuzzy clustering suggested by D'Urso et al. [33].
In this paper, by adopting a Partitioning Around Medoids (PAM) procedure we propose a fuzzy clustering using Huber's Mestimators and Yager's OWA (Ordered Weighted Averaging) operators. The proposed method belongs to the ordered statistics approach and inherits the advantages connected to PAM clustering strategy, fuzzy theory and robust methodological approach; i.e.: By using a PAM-based clustering approach, each cluster is represented by an observed object (medoid) and not by a fictitious object (centroid) (as in the fuzzy c-means clustering), which is very appealing and useful in a wide range of applications. This is very important for the interpretation of the selected clusters, as well.
The concept of partial membership underlying the fuzzy approach appears to be more appealing and flexible than that of the classical clustering procedures [78,89]; furthermore, the fuzzy clustering methods have been shown to be computationally more efficient because dramatic changes in the value of cluster membership are less likely to occur in estimation procedures [78] and less afflicted by local optima problems [54]; in addition, the memberships for any given set of respondents indicate whether there is a second-best cluster almost as good as the best cluster-a result which traditional clustering models cannot uncover [37].
Notice that PAM-based fuzzy clustering represents a robustification of the fuzzy c-means clustering; however, it provides only a "timid robustification" of the fuzzy c-means clustering, because a single outlier still serves to breakdown the clustering [39]. However, as remarked by Garcia-Escudero et al. [43] the clustering model based on medoids "resists to the presence of 1 outlier in a remote position, but it breaks down when we increase to 3 the number of outliers". For this reason, for neutralizing the disruptive effects of more outlier intervalvalued data in the clustering process we propose a robust fuzzy c-medoids clustering method using a strategy based on the combination of the Huber's M-estimators and the Yager's OWA (Ordered Weighted Averaging) operators [93].
The paper is organized as follows. In Section 2, we describe the interval-valued data and a metric for comparing interval-valued data; successively, we propose a robust fuzzy clustering for interval-valued data belonging to ordered statistics sub-approach, by considering simultaneously the Huber's M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators [93]. In particular, we combine the fuzzy c-medoids clustering with the robust ordered statistics using Huber's M-estimator to develop a new fuzzy clustering method called Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). In Section 3, we present a simulation study and in Section 4 we apply our method to a real world case. In Section 5 some remarks conclude the paper.

A robust fuzzy clustering method for interval-valued data:
FcOMdC-ID (Fuzzy c-Ordered Medoids Clustering for Intervalvalued Data)

Interval-valued data
Algebrically an interval valued datum can be formalized as x ij ¼ ½x kj ; x kj ; k ¼ 1; …; N; j ¼ 1; …; J where x kj represents the jth interval-valued variable observed on the kth object; x kj and x kj denote, respectively, the lower and upper bounds of the interval; in particular, they represent the minimum and maximum values registered for the jth interval valued variable with respect to the kth object. Each object is represented geometrically by a hyperrectangle in R j having 2 J vertices. The 2 J vertices correspond to all the possible (lower bound, upper bound) combinations. In particular, in RðJ ¼ 1Þ the generic object is represented by a segment; in R 2 ðJ ¼ 2Þ, it is represented by a rectangle with 2 2 ¼ 4 vertices, and so on [8].
Let us assume J interval-valued variables are observed on N objects. Then, the data can be stored in the so-called intervalvalued matrix as follows: where x kj represents the jth interval-valued variable observed on the kth object and x kj and x kj indicate, respectively, the lower bound (min) and upper bound (max) of the interval-valued datum x kj . If we deal with J single valued variables, each object is represented as a point in the reference space R J . Instead, in the case of interval valued data, each object is represented as a hyperrectangle (in R J ) having 2 J vertices (a rectangle with 2 J ¼ 4 vertices if J¼ 2). Then, the ith row of X refers to the ith object that can be seen as a hyperrectangle in R J .
To this purpose, we indicate M as the matrix whose generic element is the midpoint (center) of the associated interval. In particular, the midpoint matrix (center matrix) is where m kj is the midpoint (center) of the associated interval value for k ¼ 1; …; N and j ¼ 1; …; J. Furthermore, we define the radius matrix (spread matrix) R of order ðN Â JÞ in the following way: where r kj is the radius (spread) of the associated interval value for k ¼ 1; …; N and j ¼ 1; …; J. Then, we can denote the generic interval-valued datum pertaining to the kth object with respect to the jth interval-valued variable as the ordered couple ðm kj ; r kj Þ, where m kj denotes the midpoint (center) and r kj the radius (spread). In this way, the lower and upper bounds of the interval-valued datum can be obtained as m kj À r kj and m kj þr kj , respectively.
By considering the previous reformulation of the intervalvalued data, the interval-valued matrix can be reformalized as follows: Notice that as for standard datasets (single valued dataset), also in the interval case, the data can be corrupted by noise and outliers. In particular, we can distinguish three possible types of outlier interval-valued data (see Fig. 1; inlier data ¼no outlier data; [29]): 1. outlier midpoint (center)-inlier radius (spread); 2. inlier center-outlier spreads; 3. outlier center-outlier spreads.

Interval-valued data pre-processing.
Usually, in classical statistical analysis for single valued data, data standardization is recommended to prevent the results from being strongly affected by the measurement scale. Also in intervalvalued data analysis, a suitable data pre-processing, centering, normalization and standardization is recommended. In the literature, there are several measures of standardization of interval data. For instance, de Carvalho et al. [15,18] proposed some alternative standardization methods for interval data. Guo et al. [48] proposed a standardization of interval data based on the empirical descriptive statistics. See, also Billard and Diday [3], D'Urso and Giordani [35], D'Urso and De Giovanni [31] and D'Urso et al. [33].

A distance measure based on the midpoint and radius of the interval data
A distance used for clustering interval-valued data is the distance measure proposed by D'Urso and Giordani [34] and successively adopted by D'Urso and De Giovanni [31] and D'Urso et al. [33].
In particular, in order to compare two generic objects k 0 and k 00 described by J interval-valued variables, we consider all the vertices of the two hyperrectangles pertaining to the objects involved. Thus, we have the following Euclidean distance measure for interval-valued data: in which x k 0 ; x k 00 ; m k 0 ; m k 00 ; r k 0 and r k 00 denote, respectively, the k 0 th and k 00 th rows of X; M and R, where the symbol * is the Hadamard product, which is the elementwise product of two matrices (vectors) of the same order. Furthermore, the vectors h ν ; ν ¼ 1; …; 2 J , help us to define every vertex of the hyperrectangle associated to each object separately. In fact, their elements are equal to 71 in order to refer exactly to every vertex. The elements of h ν ; ν ¼ 1; …; 2 J , are the row of a new matrix, say H, of order ð2 J Â JÞ.
If J ¼3, we get: Using the vector h 1 (first row of H) we have the vector of the lower bounds pertaining to the kth object: Analogous to (7), by means of h 5 , we can obtain the vector of the upper bounds: Thus, it can be shown that the Euclidean distance (5) can be simplified as Notice that the distance measure (9) is the same distance used in de Carvalho et al. [15], which is a special case of the one in de Carvalho and de Souza [16].

Robust distance measures for interval data
The distance measure recalled in the previous subsection uses a quadratic loss function (L SQR ðeÞ ¼ e 2 , where the so-called model residuals are e ¼ m k 0 j À m k 00 j or e ¼ r k 0 j À r k 00 j ) as a dissimilarity measure between the interval-valued data and the medoids. The reason for using this measure is mathematical, that is, for simplicity and low computational burden. However, this approach is sensitive to noise and outliers. In the literature there are many proposals of robust loss functions. For example, the Huber's one is of special interest [56]: where δ40 denotes a parameter. Many other robust loss functions may be taken into account: SIGmoidal (SIG) with parameters α; β 40 LOGarithmic (LOG) For a vector argument e ¼ e 1 ; e 2 ; …; e J Â Ã a loss function takes the form:

The clustering method
The different robust loss functions are connected to robustness of the distance measure. Obviously, in this case we consider the distance between pair of objects (and then not between object and medoid).
as a dissimilarity measure between the kth interval-valued datum and the ith medoid, and additional weighting, the fuzzy c-ordered medoids criterion function takes the form: λ is a weighting exponent in ½1; 1Þ. This parameter influences a fuzziness of the clusters. β k A ½0; 1 denotes the typicality of the kth datum with respect to the clusters; smaller β k results in a more atypical data. These parameters are derived based on the ordering of the distances of data from medoids [74,75].
The set of all possible fuzzy partitions of N vectors into c clusters is defined by The overall assessment of the typicality of the kth datum is obtained using s-norm S (⋆ S ) [74]: which may be linguistically interpreted as the following sentence: "The kth datum is typical IF AND ONLY IF the kth datum is typical with respect to the first cluster OR the kth datum is typical with respect to the second cluster OR ⋯ OR the kth datum is typical with respect to the cth cluster". β ik A ½0; 1 denotes the typicality of the kth datum with respect to the ith cluster. Instead of s-norm, the maximum operation is chosen The necessary conditions for the minimization of (15) with respect to the elements of the partition matrix can be described as A local optimal solution with respect to the medoids can be obtained as follows [33]: where Let π i : 1; 2; …; N f g -1; 2; …; N f gbe the permutation function for ith medoid. The rank-ordered dissimilarities between ith medoid and data satisfy the following conditions: If β ik parameters fulfill β iπ i ð1Þ Z β iπ i ð2Þ Z ⋯ Z β iπ i ðNÞ , then the impact of outliers is reduced by down-weighting the respective dissimilarities. The form of parameters β ik may be piecewise-linear [74,75] β iπ i ðnÞ ¼ ðp c N À nÞ=ð2p l NÞþ0:5 or sigmoidal where 4 and 3 denote min and max operations, respectively. The functions defined by (23) and (24) are called Piecewise-Linearlyweighted OWA (PLOWA) and Sigmoidally-weighted OWA (SOWA), respectively. Both functions are nonincreasing with respect to argument n A f1; 2; …; Ng. For n ¼ p c N both functions are equal to 0.5. Parameters p l 40 and p a 4 0 influence their slope. In the case of the piecewise-linear function, for n A ½p c N À p l N; p c N þ p l N its value linearly decreases from 1 to 0 [74]. For the sigmoidal function, a value of 2.944 is chosen to obtain that for n A ½p c N Àp a N; p c N þ p a N its value decreases from 0.95 to 0.05 [74]. The following values: p c A ½0:7; 1:0, p l ¼ 0.1, p a ¼0.1 were used. If ordering of dissimilarities from (22) is not applied, which is equivalent to using Uniformly weighting function for OWA-UOWA (β iπ i ðnÞ ¼ 1 for all i; n), then we call this case as clustering without ordering (or with no weighting function). The final form of the proposed Fuzzy c-Ordered-Medoids Clustering for Interval-valued Data (FcOMdC-ID) can be described Algorithm FcOMdC-ID Calculate the fuzzy partition matrix U ðℓÞ for the ℓth iteration using (19), 3. Rank-order the dissimilarities between the ith medoid and data (see (22)) obtaining the permutation function π i ðkÞ, 4. Calculate β iπ i ðkÞ using (23) or (24) or uniform weighting, 5. Update overall typicality parameters β k using (18), 6. Update the medoids for the ℓth iterationX ðℓÞ using U ðℓÞ and (20) and (21), 7. If kX ðℓ þ 1Þ ÀX ðℓÞ k F 4 ξ then j'jþ 1 and go to Step 2 else stop.
Notice that the fuzziness parameter λ plays an important role in fuzzy clustering. It should be suitably chosen in advance.
Although 1 o λo1, values too close to 1 will result in a partition with all memberships close to 0 or 1. Excessively large values will lead to disproportionate overlap with all memberships close to 1=c [90]. Consequently, neither of these types of λ is recommended [1].
Although there have been some empirical heuristic procedures to determine the value of λ (e.g., [78,90,32]), there seems to exist no theoretically justifiable manner of selecting λ. Wang et al. [88] and Yang et al. [95] suggest λ ¼ 2 and Leski [73] uses λ ¼ 1:1. Since the medoid always has membership equal to 1 in the cluster, raising its membership to the power λ has no effect. Thus, when λ is high, the mobility of the medoids may be lost [30]. For this reason, a value between 1 and 1.5 for λ is recommended by Kamdar and Joshi [62]. For a discussion and a detailed list of references on the choice of λ see D'Urso and De Giovanni [32].
Notice that, with respect to the clustering methods for interval data suggested in the literature, our clustering method has different advantages; i.e.: Our method vs standard (non fuzzy) clustering methods (e.g., the de Souza-de Carvalho's method [23] and the de Carvalho et al.'s method [18]) based on, respectively, adaptive City-block and Hausdorff distances; de Carvalho-Lechevallier's methods [19,20] based on, respectively, single Hausdorff and City-block adaptive distances and adaptive quadratic distances; de Carvalho-de Souza's method ( [21] based on single and adaptive distances): conversely to standard clustering methods for interval data, our method inherits all the benefits of the fuzzy approach to cluster analysis and of the PAM-based clustering approach (see, Section 1); in addition our method is robust to the presence of one or more outliers in the datasets (see Section 1).
Our method vs. non-robust fuzzy clustering methods (i.e. the de Carvalho's method [14] and the de Carvalho-Tenorio's method [21] based on the k-means algorithm): with respect to non-robust fuzzy clustering methods proposed in the literature based on the k-means algorithm our method inherits the benefits connected to the PAM-based clustering approach (see, Section 1) and is resistant to the presence of outliers neutralizing suitably the disruptive effects of possible anomalous data in the clustering process (see Sections 1 and 2).
Our method vs robust fuzzy clustering methods (i.e. the D'Urso-Giordani's method [35]) based on the k-means algorithm and the noise approach; the D'Urso et al.'s method ( [33] based on the PAM and the trimmed approach): with respect to D'Urso et al.'s method [33] our method is more informative on the intensity of the outliers. In particular, since the D'Urso et al.'s method is based on the trimmed approach it neutralizes the outliers and suitably removes them from the clustering process. Then this method achieves its robustness with respect to outliers by trimming away a certain fraction of the data objects and requires the specification of the "trimming ratio", that is the percentage of the data objects that has to be trimmed, i.e., the percentage of objects discarded in the clustering process and, thus, not considered in the optimization problem [33]. In this way, the method neutralizes the negative effect of the outliers, but no additional information about the intensity of the anomaly (atypicality) of the outliers is provided with respect to the entire dataset. Conversely, our proposed method neutralizes the disruptive effect of the outliers adopting the PAM approach and combining the Huber's M-estimators and the Yager's OWA operators; in this way by means of the typicality parameters β k A ½0; 1 our method provides suitable information on the atypicality level of the data: if outliers are present in the dataset, our method tends to give them typicalities very low or close to 0. Then, typicality is an important means for alleviating the undesirable effects of outliers. In fact, by means of the typicality our method tunes suitably the influence of the outlier interval data in the clustering process. With respect to D'Urso-Giordani's method [35], our method inherits the benefits connected to the typicality information and to the PAM-based clustering approach (see Section 1).

Simulation study
In all experiments for FcOMdC-ID the weighting exponent λ ¼ 1:5 was used. The iterations were stopped as soon as the Frobenius norm of the successiveX matrices difference was less than 10 À 4 . All experiments were run in the MATLAB environment. The purpose of this experiment was to investigate the sensitivity to various types and number of outliers for the traditional FcMdC-ID and the FcOMdC-ID methods. The two-dimensional (two features vectors) interval-valued data sets, presented in Fig. 2, consist of three well-separated groups (each of 25 points), and a varying number of outliers. For each group interval-valued data were generated randomly. Both the centers and the spreads were from Gaussian distribution with mean m and standard deviation s; N ðm; sÞ. The centers (two features) and the spreads of the three groups were as follows:  We can see that the medoids terminate near the expected clusters centers and spreads. Fig. 3 illustrates the performance of the FcOMdC-ID method for 20 spreads outliers and the SIG loss function. Fig. 4 illustrates the performance of the FcOMdC-ID method for 20 center outliers and the SIG loss function. We can notice that despite the outliers type the medoids terminate near the expected clusters centers.
The effects of the FcMdC and the FcOMdC-ID methods investigation for varying number of outliers and its type are presented in Figs. 5-7. For the computed terminal medoids, we measured the performance of clustering by the Frobenius norm of the difference between the expected value of centers/spreads matrix and the terminal medoids matrix. Figs. 5 and 7 show that for few outliers (from 1 to 4 center or center/spread outliers type) the terminal medoids determined by all methods are close to the true centers/spreads. But for a greater number of outliers the terminal prototypes errors are smaller for FcOMdC-ID than for the FcMdC method. We can conclude that the best performance of the FcOMdC-ID method is obtained for the SIG loss function.    6 shows that that despite the large number of outliers (spread outliers type) the terminal medoids determined by all methods are close to the true centers/spreads. But the best results were archived for the FcOMdC-ID method with SIG and HUB loss functions. Fig. 7 shows that the FcMdC method performance is catastrophically deteriorated for 3 outliers. For the FcOMdC-ID method the terminal medoids determined for all loss functions are close to the true centers/spreads. The performance is catastrophically deteriorated for the number of outliers equals to 18, for the LOG loss function only. However, even for smaller number of outliers and for the FcOMdC-ID method with SQR loss function rather serious increase of errors is visible. In the case of the center outliers, the best results were archived for the FcOMdC-ID with SIG loss function.
In all the above experiments, the groups are of the same cardinality (equal to 25). When groups have different cardinality (unbalance data), as it seems, the fewest cardinality determines the resistance of the method. In order to verify this hypothesis, we conducted the following experiment. The cardinality of one of the group (located in the upper right corner of Fig. 2) is reduced from 25 to 12 elements. Other groups remained unchanged. The centers and spread outliers are added to data. The effects of the FcMdC and the FcOMdC-ID methods for such data and for varying number of outliers are presented in Fig. 8. Comparing this figure to the corresponding experiment on balanced groups cardinality, we see that the supposition proved true. In fact, the behavior of the FcOMdC-ID method for individual loss functions is similar, but scaled. Similar values for errors occur for twice fewer outliers. It is also noted that for SQR loss function we get similar errors as for the FcMdC method. As before, the best results were obtained for LIN, SIG and HUB loss functions.
Another experiment was devoted to checking what are the effects of the FcOMdC-ID method, when overlapping groups are present in the data. For this purpose, processed data are shown in Fig. 2. There have been "move up" about 14 units of the group located in the bottom right corner, to give data presented graphically in Fig. 9. Other groups remained unchanged. The center and spread outliers are added to data. The effects of the FcMdC and FcOMdC-ID methods for such data and for varying number of outliers are presented in Fig. 10. Comparing this figure to the corresponding experiment on well separated groups, we see that in most cases the results are very similar to those obtained for well-separated groups of data. However, a surprise can be better performance of the FcMdC method and worse performance of the FcOMdC-ID method with the LOG loss function.
The next experiment was conducted to compare the developed method to the trimmed fuzzy c-medoids for interval-valued data    In conclusion, the experiment shows that for SIG and HUB loss functions, method FcOMdC-ID leads to smaller errors compared to the TrFCMd-ID method. This is due to the simultaneous use of two methods to obtain resistant behavior to outliers: the loss function and the OWA operation.

Application: identification of climatic areas
4.1. Temperature data (Guru's data, 2004(Guru's data, , 2005 In this subsection, an application of the proposed algorithm FcOMdC-ID with various loss functions to real dataset is shown. We use the minimum and maximum temperature in degree centigrade (interval temperature data) over 12 months observed in 37 cities. Data are drawn from Guru et al. [50] and Guru and Kiranagi [49]. The dataset is shown in Table 1. Notice that, with respect to Guru's dataset, in Table 1 we have added another city (Ojmjakon) considered in the next application (Section 4.2). The location map of the 37 cities and Ojmjakon is shown in Fig. 12, while the latitudes and longitudes are reported in Table 2. In the experiments we consider the number of clusters ranging from 2 to 6, and parameter of SOWA ordering function p c A 0:7; 0:8; 0:9; 1:0 f g . To obtain the best number of clusters and the value of p c the Fuzzy Silhouette (FS) is used [6] FS ¼ where u rk ; u qk are the first and second largest elements of the kth column of fuzzy partition matrix U, γ is a weighting coefficient (in our experiments equals 1.0), and where b k is the minimum of the distances (using dissimilarity D) of kth datum to all remaining data from other clusters, a k is the average distance of kth datum to remaining data belonging to its highest membership cluster. Tables 3-5 show, for different values of λ (λ ¼ 1:5; λ ¼ 1:8; λ ¼ 2:0, respectively) the fuzzy silhouette value obtained for climatic dataset using the FcOMdC-ID algorithm for various number  of clusters, the parameter of the SOWA function and the type of loss function. Each element of the tables give the greatest value of FS obtained for 100 independent trials over different random choices of initial medoids. The higher the value of FS the better the assignment of data to the clusters (clusters are more compact and more well-separated). It should be noted that one should not   compare the results obtained for different loss functions because they use different dissimilarity measure. Analysis of these tables allow drawing the following conclusions. First, most often we get the greatest values of the FS for two or four clusters. Second, impact of parameter p c on the quality of obtained clusters is minimal. This effect is due to the elastic influence of the ordering function SOWA to reduce the outliers impact (by its soft downweighting) on the result clusters. From Tables 3-5, we deduce that an optimal partition for our FcOMdC-ID clustering method is obtained for SIG loss function, λ ¼ 2 and p c ¼0.7. The membership degree matrix is shown in Table 6.
In order to compare the results obtained by our FcOMdC-ID method vs the results obtained by some methods proposed in the literature, we also show in Table 6 the membership degree matrices obtained by the following methods: the partition obtained by a panel of human experts [50,49], the MSV (Mutual Similarity Value) agglomerative based clustering [50], the FcMd-ID (Fuzzy c-Medoids for Interval Data) method [33], the NcFcM-ID (Fuzzy c-Means for Interval Data) method D'Urso and Giordani [35], the TrFcMd-ID (Trimmed Fuzzy c-Medoids for Interval Data) method D'Urso et al. [33]. In addition, in order to compare the different partitions we report in Table 7 the Fuzzy Rand Index matrix. Notice that, the Fuzzy Rand Index [57] is a fuzzy extension of the original Rand index based on agreements and disagreements in two partitions: ω ¼ ða þ b þ c þ dÞ À 1 ða þ dÞ (a; d numbers of pairs of units belonging to the same/different clusters in the two partitions; b; c number of pairs of units belonging to the same clusters in one partition and to different clusters in the other).  In the partition obtained by a panel of human experts [50,49] (see Table 6) the cities belonging to cluster 1 are located in between 0°and 40°latitudes and the cities which are classified under cluster 2 are located in between 40°and 60°latitudes. However, the observers classified some cities (Athens, Lisbon, San Francisco, Seoul and Tokyo) which are closer to the sea coast and are located in between the latitudes 0°and 40°as members of cluster 2, because such cities bear low temperature which is similar to that of the cities which are located in between 40°and 60°latitudes. The cities nearer to sea coast bear relatively low temperature because of the cool breeze from the sea coast and also due to high humidity present in the atmosphere. Mauritius being the only island (in the considered data set) that is classified as a member of a singleton cluster and Tehran as a member of the other singleton because It is characterized by a singular climate behavior (small fluctuation in temperature; see Fig. 11). The partition obtained through the MSV (Mutual Similarity Value) agglomerative based clustering [50] is in accordance with the realistic clusters provided by the panel of human observers. By applying the FcOMdC-ID method (with the SIG loss function, λ ¼ 2 and p c ¼0.7) and the other fuzzy clustering methods, i.e. the timidly robust FcMd-ID method [32] (with λ ¼ 2) and the robust methods NcFcM-ID method [35] (with λ ¼ 2 and noise distance δ equal to P c i ¼ 1 Euclidean distance between the kth observations and the ith centroid) and TrFcMd-ID method [33] (with λ ¼ 2 and trimming parameter α ¼ 0), we obtain results consistent with the two non-singleton clusters indicated by the panel of experts. These evidences are corroborated by the Fuzzy Rand Index matrix shown in Table 7. Furthermore, notice that in the Guru's dataset there are no outliers, i.e. cities with minimum and maximum temperature that lie an abnormal distance from other minimum and maximum temperature values. The results obtained by all robust clustering methods confirm this evidence. In fact, as expected, the obtained results are consistent with the empirical evidences (data without outliers) and then show a stable behavior of our robust model and of the other robust ones in the examined real context.

Temperature data with outlier (modified Guru's data)
In order to show the performance, effectiveness and robustness of our clustering method, we contaminate the Guru's dataset with an outlier, i.e. we consider the monthly minimum and maximum temperature of a very cold village: Ojmjakon. Ojmjakon is a rural locality in Ojmjakonsky District of the Sakha Republic, Russia, located along the Indigirka River, 30 km northwest of Tomtor on the Kolyma Highway. Ojmjakon is one of the coldest permanently inhabited locales on the planet. With an extreme subarctic climate, Ojmjakon is known as one of the candidates for the Northern Pole of Cold. The ground there is permanently frozen (continuous permafrost). Although winters in Ojmjakon are long and excessively cold, summers are mild, sometimes with hot, and very hot, days. The climate is quite dry, but as average monthly temperatures are below freezing for seven months of the year, substantial evaporation occurs only in summer months. Summers are much wetter than winters.
The considered monthly minimum and maximum temperatures, the location and the latitude and longitude of Ojmjakon are shown together to the other cities in Tables 1 and 2 and Fig. 12.
To show the different monthly climatic features of the cities during the year, in Fig. 13a we represent the violin plots in the different months for the entire interval dataset for the midpoints of the interval data.
The violin plot synergistically combines the box plot and the density trace (or smoothed histogram) into a single display that reveals structure found within the data [55]. In particular, it includes a box plot with two slight modifications. First, a circle replaces the median line which facilitates quick comparisons when viewing multiple groups. Second, outside points which are traditionally classified as mild and severe outliers are not identified by individual symbols. The density trace supplements traditional summary statistics by graphically showing the distributional characteristics of batches of data [55]. The density trace is plotted symmetrically to the left and the right of the (vertical) box plot. There is no difference in these density traces other than the direction in which they extend. Adding two density traces gives a symmetric plot which makes it easier to see the magnitude of the density. This hybrid of the density trace and the box plot allows quick and insightful comparison of several distributions [55]. With the addition of the density trace to the box plot, violin plots provide a better indication of the shape of the distribution. This includes showing the existence of clusters in data. The density trace highlights the peaks, valleys, and bumps in the distribution [55]. About that, in the application, as we can see below, we select λ ¼ 2 by means of the fuzzy silhouette index (see Fig. 14). This result is corroborated by the bimodality of the density traces (in particular in the summer months) of midpoint temperatures shown in the violin plots in Fig. 13a.
In addition, from the violin plots shown in Fig. 13a we can see the seasonal pattern of the temperatures, the changes in temperature during the months and clearly identify the presence of an outlier.
We apply the modified Guru's dataset (Table 1) to our FcOMdC-ID method (with SIG loss function, λ ¼ 2 and p c ¼ 0.7) and other robust fuzzy clustering methods, i.e. the timidly robust FcMd-ID method [33] (with λ ¼ 2) and the robust methods NcFcM-ID method [35] (with λ ¼ 2 and noise distance δ equal to where D ik 2 indicates the squared Euclidean distance between the kth observations and the ith centroid) and the TrFcMd-ID method [33] (with λ ¼ 2 and trimming parameter α ¼ 0:025). All results are shown in Table 8.
As we can see, the fuzzy partitions obtained by means of the different robust and timidly robust methods are similar, as can also be seen from the Fuzzy Rand Index matrix (Table 9). In fact, all methods identify and neutralize in a different manner but effectively the outlier. In particular, the timid robust FcMd-ID method neutralizes the negative effects in the clustering process of the outlier because the structure of the cluster is not altered but it does not identify clearly the outlier. However, as emphasized by Garcíia-Escudero et al. [43], the clustering method based on medoids resists the presence of one outlier in a remote position, but it breaks down when we increase the number of outliers to three. The TrFcMd-ID method and the NcFcM-ID method identify the outlier and neutralize its negative effect, respectively, removing it from the analyzed dataset and putting the outlier in the noise cluster. Our method is more informative than the other ones because by means of the typicality parameter, which represents an important means for alleviating the undesirable effects of outliers, we have a measure of the intensity level of the outlier. In particular, since Ojmjakon presents a very low level of typicality, i.e. β ¼ 0:0119, it is represented in the set of the cities as a village with very atypical levels of temperatures. Then, our method not only is able to neutralize and tune suitably the influence of the anomalous interval data in the clustering process, but it also provides a suitable measure of the typicality/atypicality level of the outlier interval data.
With reference to the results obtained by means of our method, in Fig. 15 we show the radar plots (zoom stars in the interval data analysis) for the 2 medoid cities, for the fuzzy cities and for the outlier city.
As we can see, the prototype (medoid) of the cluster 1, Colombo, represents the cities (belonging to cluster 1) characterized by warm temperatures in all the months of the year, while the medoid of the cluster 2, Paris, represents a cluster with cities characterized by temperate climate. The "fuzzy" cities Athens and Mexico City are warm in summer and cold in the other months; vice versa for the "fuzzy" cities Nairobi and Sydney that are located in the opposite hemisphere. The fuzziness of Tehran is connected to its climatic conditions: it is very cold in winter and very hot in summer with small month temperature changes. As we can see, the outlier city, Ojmjakon, is characterized by very cold temperatures. These empirical findings are corroborated by the results shown in Figs. 13b and c, in which we represent the violin plots for the midpoint temperatures obtained by considering the cities with membership degrees to cluster 1 and cluster 2 greater than 0.60 (cut-off ¼0.60).

Conclusions
The real interval-valued data can contain outliers. Therefore the clustering methods need to be robust. This paper combines the fuzzy c-medoids clustering with the robust ordered statistics using Huber's M-estimator. The developed FcOMdC-ID clustering method is based on various dissimilarity measures (as squared, linear, Huber, sigmoidal and logarithmic) and an ordering of models residuals. The method is introduced as the problem of a constrained minimization of the criterion function. The necessary conditions for obtaining local minimum of the criterion function with respect to the elements of the partition matrix are shown. The existing fuzzy c-medoids clustering method can be obtained as special cases of the method developed. The study of the FcOMdC-ID with the traditional fuzzy c-medoids as the reference methods is included. These numerical examples show the usefulness of the method proposed when applied to clustering data with different types of outliers.
Furthermore, in order to show the effectiveness of our method in an empirical context, the FcOMdC-ID method is applied and compared with other robust fuzzy clustering methods.