Wilk's Lambda

« 2025/03 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

2014. 12. 15. 14:26

■ Wilk's Lambda

1. Wilk's Lambda 란[1]

검증 통계치(test statistics)에서 많이 사용되는 검정 통계량

Wilks lambda ranges from 0 – 1 and the lower the Wilks lambda, the larger the between group

dispersion. A small (close to 0) value of Wilks' lambda means that the groups are well separated. A large (close to 1) value of Wilks' lambda means that the groups are poorly separated.

- 0 ~ 1 사이의 값을 가짐

- 1에 가까우면 그룹들이 잘 분류되지 않음

- 0에 가까우면 그룹들이 잘 분류됨

2. Wilk's Lambda 식 (Λ)

2.1 각 수식에 대한 개념

W: Within-groups sum of squares and cross-products matrix :: 그룹 내 제곱합과 대칭 행렬[2]

--> 클래스 내 분산과 같음

B: Between-groups sum of squares and cross-products matrix

--> 클래스 간 분산과 같음

T: Total sum of squares and cross-products matrix

[note] the cross product matrix X' X is a symmetric matrix.

- 2 참조 사이트에 가면 sum of square 에 대한 개념과 cross-product matrix에 대한 개념을 이해할 수 있음

2.2 각 요소에 대한 개념

g: is the number of group(그룹(클래스)의 수)

ni: is the number of observations in the ith group(i 번재 그룹에 속하는 패턴들의 수)

/Xi: The mean vector of the ith group(i 번째 그룹의 평균)

/X: The mean vector of the all the observations(전체 평균)

Xij: The jth multivariate observation in the ith group(i 번째 그룹에서 j 번째 나타난 다변량 관측 값; i행에 j번째 값)

2.3 수식 설명

그림에서 볼 수 있듯이 클래스간 분산(B)를 분모로 넣고 클래스내 분산(W)를 분자로 나누었을 경우 생각해 보면

- 클래스 내 분산이 작고, 클래스간 분산이 크면: 분류 하기 쉬움(결과 값이 작아짐)

- 클래스 내 분산이 크고, 클래스간 분산이 작으면: 분류 하기 어려움(결과 값이 커짐)

즉, Wilk's Lambda 수식에서 Lambda(Λ)의 값이 작을 수록 판별하기 수월한 능력을 가진다라고 이야기 할 수 있음.

3. 예제

레퍼런스 2번에 있는 값들을 사용하여 계산해 보자. 초점은 어떻게 Wilk's Lambda를 계산하는가 이다.

Formulation I		Formulation II		Formulation III
Cmax(X1)	AUC(X2)	Cmax(X1)	AUC(X2)	Cmax(X1)	AUC(X2)
0.342	2.1	0.169	1.097	0.091	0.724
0.11	0.747	0.295	1.76	0.264	1.538
0.279	1.833	0.381	2.294	0.463	2.417
0.2	1.32	0.173	1.024	0.19	1.379
0.207	1.245	0.37	2.384	0.101	0.737

step 1: 먼저 아래와 같이 다시 메트릭스를 구성하자.(보기 편하게)

Group	Cmax(X1)	AUC(X2)
	0.342	2.1
	0.11	0.747
Formulation1	0.279	1.833
	0.2	1.32
	0.207	1.245
	0.169	1.097
	0.295	1.76
Formulation2	0.381	2.294
	0.173	1.024
	0.37	2.384
	0.091	0.724
	0.264	1.538
Formulation3	0.463	2.417
	0.19	1.379
	0.101	0.737

step 2: 관련 변수 값 구하기

A) 평균

Group	Cmax(X1)	AUC(X2)
Formuldation1	0.2276	1.449
Formuldation2	0.2776	1.7118
Formuldation3	0.2218	1.359

B) 전체 평균

	Cmax(X1)	AUC(X2)
Total Mean	2.242333	1.5066

Step 3: T 값 구하기

A) 각 raw 데이터에 컬럼 전체 평균을 빼줌( raw data - mean of each column)

Cmax(X1)	AUC(X2)
0.099667	0.5934
-0.13233	-0.7596
0.036667	0.3264
-0.04233	-0.1866
-0.03533	-0.2616
-0.07333	-0.4096
0.052667	0.2534
0.138667	0.7874
-0.06933	-0.4826
0.127667	0.8774
-0.15133	-0.7826
0.021667	0.0314
0.220667	0.9104
-0.05233	-0.1276
-0.14133	-0.7696

- 이름을 Tmatrix 라고 임의로 지정

B) T 값 구하기: Tmatrix' * Tmatrix

여기서 '는 전치행렬(transposed matrix)

수행결과: T=

0.1751	0.9223
0.9223	5.0445

Step 4: W 값 구하기

A) 각 그룹별로 그룹 평균 빼기

Cmax(X1)	AUC(X2)
0.1144	0.651
-0.1176	-0.702
0.0514	0.384
-0.0276	-0.129
-0.0206	-0.204
-0.1086	-0.6148
0.0174	0.0482
0.1034	0.5822
-0.1046	-0.6878
0.0924	0.6722
-0.1308	-0.635
0.0422	0.179
0.2412	1.058
-0.0318	0.02
-0.1208	-0.622

B) 각 그룹별(클래스내) 분산 구하기

CovG1 = Formuldation1' * Formuldation1

CovG2 = Formuldation2' * Formuldation2

CovG3 = Formuldation3' * Formuldation3

covG1 =

0.0307 0.1845

0.1845 1.1223

covG2 =

0.0423 0.2619

0.2619 1.6442

covG3 =

0.0927 0.4203

0.4203 1.9419

C) 각 그룹 분산 더하기 최종 W 계산

covTotal =

0.1657 0.8667

0.8667 4.7084

Step 4: Wilk's Lambda 계산

Λ = |W|/|T|

W = 0.029

T = 0.033

Λ = 0.029/0.033 = 0.879

이상. Wilk's Lambda 의 값이 1의 값에 근접하고 있다. 때문에 Cmax와 AUC를 통해 뚜렷하게 목표로하는 대상을 구분하기 어렵다.

:: 오랜만에 쓰는군... 역시나 쉽지 않아.

4. 구현 결과

Reference

[1] http://www.ijpsi.org/VOl(2)1/Version_3/G0213644.pdf

[2] http://stattrek.com/matrix-algebra/sums-of-squares.aspx

저작자표시 비영리

'PatternRecognition' 카테고리의 다른 글

Linear Discriminant Analysis(LDA) - C-Classes (0)	2014.06.02
Linear Discriminant Analysis(LDA) - 2 classes (0)	2014.05.30
Neural Networks: Data normalization (0)	2014.04.25
The Basic Artificial Neuron: Bias neuron(Backpropagation) (3)	2014.04.23
기초 통계(Statistic) (0)	2014.04.14

Trackback : Comment

'PatternRecognition' 카테고리의 다른 글

티스토리툴바