On definition and construction of association measures
Abstract
The definition and the general methods of construction of non-statistical association measures on different domains are discussed. An association measure is a function of two variables defined on a set X with involutive operation and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. Such measure can be used for analysis of the possible positive and negative relationships between variables. The methods of construction of association measures using similarity measures and pseudo-difference operations associated to t-conorms are discussed. The examples of association measures on different domains are considered.
1Introduction
The association measures are widely used in data analysis. Different association and correlation measures have been introduced in statistics, data mining, fuzzy set theory etc. [1, 7, 12, 13, 17] for different types of data. The Pearson’s correlation coefficient [12]
(1)
is the most popular association measure used for analysis of possible relationships between variables. Many association measures similar to the correlation coefficient have been proposed but it is an interesting problem not only to introduce a new association measure for some type of data but to analyze a class of functions similar to the correlation coefficient and to propose the methods of their generation. In [1], it was proposed the measure of correlation between fuzzy membership functions satisfying to the set of properties similar to the properties of the Pearson’s correlation coefficient. In [6], it was considered another set of properties similar to the properties of Pearson’s correlation coefficient and defining the time series shape association measures. In [7], the general methods of construction of such association measures have been proposed and the sample Pearson’s correlation coefficient was obtained as a particular case of the general approach. In [8], the methods proposed in [7] have been extended on the general case of functions A : X × X → [-1, 1] defined on a set X with involutive operation N (called reflection) and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. The methods of construction of such measures [8, 9] use similarity measures and pseudo-difference operations associated with t-conorms [2, 15]. In [9], the problems appeared in the definition of the general class of functions similar to the Pearson’s correlation coefficient have been discussed. These problems have the different reasons. First, the properties of the function (1): corr(x,x) = 1 and corr(x,–x) = –1, are, really, contradictive for the n-tuple x = (0, … ,0) where it is fulfilled: x = –x. The similar problem appears, generally, for the fixed points of the reflection operation N used in the definition of the association measure A. Second, the function (1) does not defined for the constant n-tuples x = (x 1, …, x n ) = (s, …, s) where s is some real value because the denominator of (1) equals to 0. Similarly, it is possible that an association measure cannot be defined on all set X. Such elements of X can be excluded from the domain of the association measure or this function should be additionally defined there. Third, depending on the domain X, additionally to the general properties of the association measures it is possible to consider other properties specific for this domain. See, for example, the definition of time series shape association measure [6, 7].
The current paper tries to avoid these problems by two ways. First, to consider explicitly the association measures defined on some subset V of X where these problems disappear. Second, to define the association measure on the set X and to correct some properties required from the association measure to avoid the possible contradiction between them.
The current paper also gives the proofs of some general results considered in the previous papers of the author without proofs. Some related details can be found also in [10].
The paper has the following structure. Section 2 discusses the definitions and the properties of association measures defined on the sets with involutive operation. For example, the simple association measure on the set of real values is introduced. Section 3 considers the basic definitions and the properties of operations of fuzzy logic used in the following sections. Section 4 considers the general methods of construction of association measures and gives the proofs of the related theoretical results. Section 5 considers an example of association measure constructed by proposed methods. The conclusions are given in the last section.
2Association measures
Let X be a set and |X|>1.
Definition 1. Let N : X → X be a function satisfying for all x ∈ X the property:
(2)
N is called a reflection on X if it is not an identical function, i.e. for some x ∈ X it is fulfilled N (x) ≠ x. An element x ∈ X, such that
(3)
is called a fixed point of N in X.
The fixed points will be denoted by x FP , hence:
(4)
Denote FP (N, X) the set of all fixed points of N in X. This set can be empty.
Definition 2. Let X be a set with a reflection operation N on X, V be a subset of X, |V|>1, from x ∈ V it follows N (x) ∈ V and the restriction of N on V is a reflection on V. A function A : V × V → [-1, 1] satisfying for all x, y ∈ V the properties:
(5)
(6)
(7)
is called an association measure on V.
Proposition 1. If A is an association measure on V ⊆ X then V ⊆ X ∖ FP (N, X).
Proof. Suppose Proposition 1 does not true, i.e. A is afunction satisfying on V the properties (5)-(7) and Vcontains some fixed point x FP of the reflection N.Then from Equations (4) and (6) we obtain: A (x FP , N (x FP )) = A (x FP , x FP ) =1, but from Equations(7) and (6) we have: A (x FP , N (x FP )) = - A (x FP , x FP ) = -1. The obtained contradiction proves the proposition. ■
Consider the following properties of association measures:
(8)
(9)
Proposition 2. The association measure A on V satisfies for all x, y ∈ V the properties (8), (9) and the following property:
(10)
Proof. (8) and (9) follow from Equations (7) and (5): A (N (x) , N (y)) = - A (N (x) , y) = - A (y, N (x)) = A (y, x) = A (x, y). A (N (x) , y) = A (y, N (x)) = - A (y, x) = - A (x, y) = A (x, N (y)). Equation (10)follows from Equations (7) and (6): A (x, N (x)) = - A (x, x) = -1. ■
Definition 3. Let X be a set with a reflection operation N on X and FP (N, X) be a set of fixed points of N. A function A : X × X → [-1, 1] will be called:
1) an association measure of type 1 on X if Equations (5) and (7) are fulfilled for all x, y ∈ X and Equation (6) is fulfilled for all x ∉ FP (N, X);
2) an association measure of type 2 on X if Equations (5) and (6) are fulfilled for all x, y ∈ X and Equation (7) is fulfilled for all x ∈ X and all y ∉ FP (N, X).
Proposition 3. An association measure A of type 1 on X satisfies for all x, y ∈ X the properties (8), (9) and:
(11)
(12)
Proof. For the proof of Equations (8), (9) and (11) see the proof of Proposition 2. Let us prove Equation (12). From Equations (4), (7) and (5) for all x ∈ X and all x FP ∈ FP (N, X) it follows: A (x, x FP ) = A (x, N (x FP )) = - A (x, x FP ) hence A (x, x FP ) =0 and A (x FP , x) =0. ■
Note that from Proposition 3 it follows A (x FP , x FP ) =0. Although some papers require the fulfillment of (6) for all x ∈ X, in this paper the association measures of type 2 will be not considered. The property (12) of association measures of type 1 seems more reasonable. See [10].
2.1Association measures on [0,1]
In [10], it was considered an association measure of type 1 on [0,1] related with the strong negation N.
Definition 4. A strong negation on X = [0,1] is a continuous strictly decreasing function N : [0, 1] → [0, 1] satisfying for all x ∈ [0, 1] the following properties:
(13)
(14)
A strong negation is a reflection operation on [0,1] with the unique fixed point denoted as c. In [10], it was considered the class of c-separable association measures of type 1 satisfying for all x, y ∈ [0, 1] the properties:
(15)
(16)
(17)
Such association measures can be used for analysis of associations between truth or probability values of some plausible statements P and Q. For example, the association between them is negative when one statement has high plausibility value and another one has low plausibility value.
2.2Association measures on the set of real values
Let X be a set of real values, X = R, and N (x) = - x for all x in R. We have X FP = 0 and FP (N, X) = {0}. The association measures of type 1 satisfy on R the properties:
(18)
(19)
(20)
From Proposition 3 we obtain the following properties of the association measures on R:
(21)
(22)
(23)
(24)
Similarly to the c-separable association measure on [0,1] introduce the following definition.
Definition 5. An association measure A on the set ofreal values R is called 0-separable (or simply “separable”) if the following properties are fulfilled for all x, y ∈ R:
(25)
(26)
(27)
0-separable association measures have the simple interpretation: x and y are positively associated if they have the same sign and they are negatively associated if they have the opposite signs. Based on these considerations it can be proposed the following simplest association measure on the set of real values.
Proposition 4. The function
(28)
is the 0-separable association measure of type 1 on the set of real values.
The proof is straightforward.
2.3Association measures on the set of time series
Association measures on the set of time series are considered in [6, 7]. A time series of the length n, (n > 1), is a sequence (n-tuple) of a real values x = (x 1 … , x n ). Consider the reflection operation N (x) = - x = (- x 1, …, - x n ) on the set X of all time series with the length n. Suppose p, q are real values and p ≠ 0. Define x + y = (x 1 + y 1, …, x n + y n ) and py + q = (py 1 + q, …, py n + q). Denote q (n) a constant time series with the length n with all elements equal to q. The n-tuple x FP = 0(n) is a unique fixed point of N. We write x = const if x = q (n) for some q, and x ≠ const if x i ≠ x j for some i ≠ j from {1, …, n}. Denote X C a set of all constant time series from X.
Definition 6. Suppose V is a subset of X such that from x ∈ V it follows -x ∈ V, and x + q ∈ V for all real q. A function A : V × V → [-1, 1] satisfying on V the properties Equations (5)–(7) and the property:
(29)
is called a shape association measure on V. If from x ∈ V it is fulfilled px ∈ V for all p > 0 and A satisfies on V the property:
(30)
then A is called a scale invariant association measure.
Proposition 5. If A is an association measure on V then V ⊆ X ∖ X C .
Proof. Suppose Proposition 5 does not true, i.e. A is an association measure on V, and V contains constant time series x = s (n) where s is some real value. For q = -2s we have x + q = x - 2s = (s - 2s, …, s - 2s) = (- s) (n) = - (s) (n) = - x and from Equations (29), (5)–(7) we obtain: A (x, x) = A (x + q, x) = A (- x, x) = A (x, - x) = - A (x, x) =-1, that contradicts to reflexivity of A. The obtained contradiction proves the Proposition ■
In the next section, there are considered the basic properties of some operations of fuzzy logic that will be used further in construction of association measures.
3Basic properties of operations of fuzzy logic
Consider the basic properties of the operations of fuzzy logic used in the following sections [2–5, 10, 11, 14–16, 18].
Definition 7. t-conorm is a function S : [0, 1] 2 → [0, 1] satisfying for all x, y, z ∈ [0, 1] the following properties:
From the definition of t-conorm it follows for all a ∈ [0, 1]:
(31)
Definition 8. A t-conorm S is nilpotent if there exist x, y ∈]0, 1 [ such that S (x, y) =1.
Definition 9. An element x ∈]0, 1 [ is a nilpotent element of t-conorm S if there exists y ∈]0, 1 [ such that S (x, y) =1.
It is clear that t-conorm S has no nilpotent elements if and only if for all x, y ∈ [0, 1] it is fulfilled:
(32)
Consider the simplest, basic, t-conorms:
Maximum and probabilistic sum have no nilpotent elements but Lukasiewicz t-conorm has.
Definition 10. Let S be a t-conorm.
(i) The
(33)
(ii) The pseudo-difference
(34)
Pseudo-difference
1)
(35)
2) For any a, b ∈ [0, 1] it is fulfilled:
(36)
3) If t-conorm S is continuous at the point 0 in both arguments then the following is fulfilled for all a, b ∈ [0, 1]:
(37)
The following pseudo-differences are associated to the basic t-conorms S M , S P and S L :
(38)
(39)
(40)
Definition 11. An automorphism of the interval[0,1] is a continuous, strictly increasing function φ : [0, 1] → [0, 1] satisfying boundary conditions φ (0) =0, φ (1) =1.
Theorem 1. [3, 18]. A function N : [0, 1] → [0, 1] is a strong negation if and only if there exists an automorphism φ of the unit interval such that
(41)
The function φ in Equation (41) is called a generator of N.
Example 1. The standard negation:
(42)
has the generator φ (x) = x and the fixed point c = 0.5.
Example 2. Yager negation:
(43)
has the generator φ (x) = x
p
and the fixed point
Example 3. The negation, introduced by Batyrshin in [10]:
(44)
It has the generator:
(45)
This simple strong negation connects by line segments the fixed point (c, c) with the points (0,1) and (1,0). It can be used for construction of strong negations with any fixed point c ∈]0, 1 [.
4Constructing association measures
Consider the methods of construction of association measures using a similarity measure and pseudo-difference operation associated with some t-conorm and prove the related results [8, 9].
Definition 12. A function SIM : X × X → [0, 1] is a similarity measure on X if it satisfies for all x, y ∈ X the properties:
(46)
(47)
Definition 13. A similarity measure SIM on X is strict reflexive if
(48)
For strict reflexive similarity measure SIM on X with reflection N it is fulfilled the property of weak similarity of reflections:
(49)
Proposition 6. The similarity measure SIM satisfies for all x, y ∈ X the cancellation of reflections property:
(50)
if and only if it satisfies the permutation of reflections property:
(51)
Proof. Equation (50) follows from Equation (51)and involutivity of N : SIM (N (x) , N (y)) = SIM(N (N (x) , y)) = SIM (x, y). Equation (51) follows from Equation (50) and involutivity of N : SIM (x, N (y)) = SIM (N (x) , N (N (y))) = SIM (N (x) , y). ■
Theorem 2. Suppose X is a set with a reflection N, V ⊆ X ∖ FP (N, X) , |V|>1, V is closed under N which is a reflection on V, S is a t-conorm and SIM is a similarity measure on X satisfying the permutation of reflections property then the function A SIM,S : V × V → [-1, 1] defined for all x, y ∈ V by
(52)
(53)
(54)
is an association measure on V.
Proof. We need to prove only Equations (5) and (7).
Let us prove Equation (5). For y = x Equation (5) is fulfilled trivially.
For y = N (x) from involutivity of N we have x =N (y), and from Equation (53) we obtain: A SIM,S (x, y) = A SIM,S (x, N(x))=-1 and A SIM,S (y, x)=A SIM,S (y, N(y))=-1, hence A SIM,S (x, y)=A SIM,S (y, x).
Suppose y ≠ x, y ≠ N (x) then from the involutivity of N it follows N (y) ≠ x. From Equations (54), symmetry and permutation of reflections properties of SIM we obtain:
Let us prove Equation (7). For y = x Equation (7)follows from Equations (53) and (52): A SIM,S (x,N (y)) = A SIM,S (x, N (x)) = -1 = - A SIM,S (x, x) =-A SIM,S (x, y).
For y = N (x) Equation (7) follows from the involutivity of N, Equations (52) and (53): A SIM,S (x, N (y)) = A SIM,S (x, N (N (x))) = A SIM,S (x, x) = 1 = - A SIM,S (x, N(x)) = - A SIM,S (x, y).
If y ≠ x, y = N (x) then Equation (7) follows from Equation (54), involutivity of N and Equation (36):
Theorem 3. If in the conditions of the Theorem 2 SIM is a similarity measure on X satisfying the properties of permutation of reflections and weak similarity of reflections then the function A SIM,S : V × V → [0, 1] defined for all x, y ∈ X by:
(55)
is an association measure on V if one of the following is fulfilled:
(56)
(57)
Proof. The symmetry of A SIM, S follows from the symmetry and the permutation of reflections properties of SIM:
The reflexivity of A SIM,S follows from the reflexivity and weak similarity of reflections properties of SIM and from (35) that requires the fulfilment of (56) or (57):
The inverse relationship of A S IM, S follows from the involutivity of N and Equation (55):
5Examples of association measures
The examples of association measures on different domains constructed by the methods discussed in the previous section can be found in [7–10]. The similarity measures satisfying the conditions of Theorems 2 and 3 can be obtained from the distance measures used together with some data transformation [7], from generators of strong negations [3, 5, 9, 10] etc. For example, suppose φ, ψ : [0, 1] → [0, 1] are automorphisms of [0,1] and φ defines by (41) a strong negation N on [0,1]. Then the function
(58)
is a similarity measure on [0,1] that can be used for constructing association measure on [0,1] related with strong negations (42)-(44) (see [10] for details). Below is an example of the simplest association measure on [0,1] related with the standard negation(42) [10]:
6Conclusion
The paper gives the definitions of the association measures generalizing the Pearson’s correlation coefficient and proposes the general methods of construction of such measures. The proofs of the main results are provided. The simple association measure on the set of real numbers is introduced. The considered methods of generation of association measures can be used for construction of association measures on different domains.
Acknowledgments
This work was partially supported by the project 20151589 of Instituto Politécnico Nacional, Mexico.
References
1 | Murthy CA, Pal SK, Dutta Majumder D (1985) Correlation between two fuzzy membership functions Fuzzy Sets and Systems 17: 23 38 |
2 | Klement EP, Mesiar R, Pap E (2000) Triangular Norms Kluwer Dordrecht |
3 | Trillas E (1979) Sobre funciones de negacion en la teoria de conjuntos difusos Stochastica 111: 47 59 |
4 | Beliakov G, Pradera A, Calvo T (2008) Aggregation functions: A guide for practitioners Springer Publishing Company Incorporated |
5 | Batyrshin I (2003) On the structure of involutive, contracting and expanding negations Fuzzy Sets and Systems 139: 661 672 |
6 | Batyrshin I, Sheremetov L, Velasco-Hernandez JX (2012) On axiomatic definition of time series shape association measures ORADM 2012, Workshop on Operations Research and Data Mining 117 127 Cancun |
7 | I. Batyrshin, Constructing time series shape association measures: Minkowski distance and data standardization, in: BRICS CCI 2013, Brasil, Porto de Galhinas, 2013. http://arxiv.org/pdf/1311.1958v3 |
8 | Batyrshin I (2013) Association measures and aggregation functions, in: Advances in Soft Comuting and Its Applications Lecture Notes in Computer Science 8266: 194 203 Springer |
9 | Batyrshin I (2015) Association measures on sets with involution and similarity measure Proc 4th World Conference on Soft Computing, Berkeley, California, 2014. Will be printed also by Springer |
10 | Batyrshin IZ (2015) Association measures on [0,1] Journal of Intelligent and Fuzzy Systems 29: 1011 1020 |
11 | Fodor J, Roubens M (1994) Fuzzy Preference Modelling and Multi-Criteria Decision Support Kluwer Dordrecht |
12 | Hair JFJr, Anderson ER, Tatham RL (1986) Multivariate data analysis with readings Macmillan Publishing Co., Inc. |
13 | Han J, Kamber M (2006) Data mining: Concepts and techniques 2nd Morgan Kaufmann Amsterdam |
14 | Zadeh LA (1965) Fuzzy sets Information and Control 8: 338 353 |
15 | Grabisch M, Marichal J-L, Mesiar R, Pap E (2009) Aggregation Functions Cambridge University Press Cambridge |
16 | Yager RR (1980) On the measure of fuzziness and negation. II. Lattices Information and Control 44: 236 260 |
17 | Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: Generalizing association rules to correlations ACM SIGMOD Record 265 276 |
18 | Ovchinnikov S, Roubens M (1991) On strict preference relations Fuzzy Sets and Systems 43: 319 326 |