Browse code

starting som prediction fine-tuned class-performance visualisation

git-svn-id: https://svn.discofish.de/MATLAB/spmtoolbox/SVMCrossVal@112 83ab2cfd-5345-466c-8aeb-2b2739fb922d

Christoph Budziszewski authored on21/01/2009 16:34:25
Showing1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,294 @@
1
+
2
+%SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map.
3
+
4
+% Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
5
+% http://www.cis.hut.fi/projects/somtoolbox/
6
+
7
+% Version 1.0beta juuso 071197
8
+% Version 2.0beta juuso 030200 
9
+
10
+clf reset;
11
+figure(gcf)
12
+echo on
13
+
14
+
15
+
16
+clc
17
+%    ==========================================================
18
+%    SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM
19
+%    ==========================================================
20
+
21
+%    som_make        - Create, initialize and train a SOM.
22
+%     som_randinit   - Create and initialize a SOM.
23
+%     som_lininit    - Create and initialize a SOM.
24
+%     som_seqtrain   - Train a SOM.
25
+%     som_batchtrain - Train a SOM.
26
+%    som_bmus        - Find best-matching units (BMUs).
27
+%    som_quality     - Measure quality of SOM.
28
+
29
+%    SELF-ORGANIZING MAP (SOM):
30
+
31
+%    A self-organized map (SOM) is a "map" of the training data, 
32
+%    dense where there is a lot of data and thin where the data 
33
+%    density is low. 
34
+
35
+%    The map constitutes of neurons located on a regular map grid. 
36
+%    The lattice of the grid can be either hexagonal or rectangular.
37
+
38
+subplot(1,2,1)
39
+som_cplane('hexa',[10 15],'none')
40
+title('Hexagonal SOM grid')
41
+
42
+subplot(1,2,2)
43
+som_cplane('rect',[10 15],'none')
44
+title('Rectangular SOM grid')
45
+
46
+%    Each neuron (hexagon on the left, rectangle on the right) has an
47
+%    associated prototype vector. After training, neighboring neurons
48
+%    have similar prototype vectors.
49
+
50
+%    The SOM can be used for data visualization, clustering (or 
51
+%    classification), estimation and a variety of other purposes.
52
+
53
+pause % Strike any key to continue...
54
+
55
+clf
56
+clc
57
+%    INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP
58
+%    ============================================
59
+
60
+%    Here are 300 data points sampled from the unit square:
61
+
62
+D = rand(300,2);
63
+
64
+%    The map will be a 2-dimensional grid of size 10 x 10.
65
+
66
+msize = [10 10];
67
+
68
+%    SOM_RANDINIT and SOM_LININIT can be used to initialize the
69
+%    prototype vectors in the map. The map size is actually an
70
+%    optional argument. If omitted, it is determined automatically
71
+%    based on the amount of data vectors and the principal
72
+%    eigenvectors of the data set. Below, the random initialization
73
+%    algorithm is used.
74
+
75
+sMap  = som_randinit(D, 'msize', msize);
76
+
77
+%    Actually, each map unit can be thought as having two sets
78
+%    of coordinates: 
79
+%      (1) in the input space:  the prototype vectors
80
+%      (2) in the output space: the position on the map
81
+%    In the two spaces, the map looks like this: 
82
+
83
+subplot(1,3,1) 
84
+som_grid(sMap)
85
+axis([0 11 0 11]), view(0,-90), title('Map in output space')
86
+
87
+subplot(1,3,2) 
88
+plot(D(:,1),D(:,2),'+r'), hold on
89
+som_grid(sMap,'Coord',sMap.codebook)
90
+title('Map in input space')
91
+
92
+%    The black dots show positions of map units, and the gray lines
93
+%    show connections between neighboring map units.  Since the map
94
+%    was initialized randomly, the positions in in the input space are
95
+%    completely disorganized. The red crosses are training data.
96
+
97
+pause % Strike any key to train the SOM...
98
+
99
+%    During training, the map organizes and folds to the training
100
+%    data. Here, the sequential training algorithm is used:
101
+
102
+sMap  = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10);
103
+
104
+subplot(1,3,3)
105
+som_grid(sMap,'Coord',sMap.codebook)
106
+hold on, plot(D(:,1),D(:,2),'+r')
107
+title('Trained map')
108
+
109
+pause % Strike any key to view more closely the training process...
110
+
111
+
112
+clf
113
+
114
+clc
115
+%    TRAINING THE SELF-ORGANIZING MAP
116
+%    ================================
117
+
118
+%    To get a better idea of what happens during training, let's look
119
+%    at how the map gradually unfolds and organizes itself. To make it
120
+%    even more clear, the map is now initialized so that it is away
121
+%    from the data.
122
+
123
+sMap = som_randinit(D,'msize',msize);
124
+sMap.codebook = sMap.codebook + 1;
125
+
126
+subplot(1,2,1)
127
+som_grid(sMap,'Coord',sMap.codebook)
128
+hold on, plot(D(:,1),D(:,2),'+r'), hold off
129
+title('Data and original map')
130
+
131
+%    The training is based on two principles: 
132
+%     
133
+%      Competitive learning: the prototype vector most similar to a
134
+%      data vector is modified so that it it is even more similar to
135
+%      it. This way the map learns the position of the data cloud.
136
+%
137
+%      Cooperative learning: not only the most similar prototype
138
+%      vector, but also its neighbors on the map are moved towards the
139
+%      data vector. This way the map self-organizes.
140
+
141
+pause % Strike any key to train the map...
142
+
143
+echo off
144
+subplot(1,2,2)
145
+o = ones(5,1);
146
+r = (1-[1:60]/60);
147
+for i=1:60,
148
+  sMap = som_seqtrain(sMap,D,'tracking',0,...
149
+		      'trainlen',5,'samples',...
150
+		      'alpha',0.1*o,'radius',(4*r(i)+1)*o);
151
+  som_grid(sMap,'Coord',sMap.codebook)
152
+  hold on, plot(D(:,1),D(:,2),'+r'), hold off
153
+  title(sprintf('%d/300 training steps',5*i))
154
+  drawnow
155
+end
156
+title('Sequential training after 300 steps')
157
+echo on
158
+
159
+pause % Strike any key to continue with 3D data...
160
+
161
+clf
162
+
163
+clc
164
+%    TRAINING DATA: THE UNIT CUBE
165
+%    ============================
166
+
167
+%    Above, the map dimension was equal to input space dimension: both
168
+%    were 2-dimensional. Typically, the input space dimension is much
169
+%    higher than the 2-dimensional map. In this case the map cannot
170
+%    follow perfectly the data set any more but must find a balance
171
+%    between two goals:
172
+
173
+%      - data representation accuracy
174
+%      - data set topology representation accuracy    
175
+
176
+%    Here are 500 data points sampled from the unit cube:
177
+
178
+D = rand(500,3);
179
+
180
+subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r')
181
+view(3), axis on, rotate3d on
182
+title('Data')
183
+
184
+%    The ROTATE3D command enables you to rotate the picture by
185
+%    dragging the pointer above the picture with the leftmost mouse
186
+%    button pressed down.
187
+
188
+pause % Strike any key to train the SOM...
189
+
190
+
191
+
192
+
193
+clc
194
+%    DEFAULT TRAINING PROCEDURE
195
+%    ==========================
196
+
197
+%    Above, the initialization was done randomly and training was done
198
+%    with sequential training function (SOM_SEQTRAIN). By default, the
199
+%    initialization is linear, and batch training algorithm is
200
+%    used. In addition, the training is done in two phases: first with
201
+%    large neighborhood radius, and then finetuning with small radius.
202
+
203
+%    The function SOM_MAKE can be used to both initialize and train
204
+%    the map using default parameters:
205
+
206
+pause % Strike any key to use SOM_MAKE...
207
+
208
+sMap = som_make(D);
209
+
210
+%    Here, the linear initialization is done again, so that 
211
+%    the results can be compared.
212
+
213
+sMap0 = som_lininit(D); 
214
+
215
+subplot(1,3,2)
216
+som_grid(sMap0,'Coord',sMap0.codebook,...
217
+	 'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3)) 
218
+axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization')
219
+
220
+subplot(1,3,3)
221
+som_grid(sMap,'Coord',sMap.codebook,...
222
+	 'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3)) 
223
+axis([0 1 0 1 0 1]), view(3), title('After training'), hold on
224
+
225
+%    Here you can see that the 2-dimensional map has folded into the
226
+%    3-dimensional space in order to be able to capture the whole data
227
+%    space. 
228
+
229
+pause % Strike any key to evaluate the quality of maps...
230
+
231
+
232
+
233
+clc
234
+%    BEST-MATCHING UNITS (BMU)
235
+%    =========================
236
+
237
+%    Before going to the quality, an important concept needs to be
238
+%    introduced: the Best-Matching Unit (BMU). The BMU of a data
239
+%    vector is the unit on the map whose model vector best resembles
240
+%    the data vector. In practise the similarity is measured as the
241
+%    minimum distance between data vector and each model vector on the
242
+%    map. The BMUs can be calculated using function SOM_BMUS. This
243
+%    function gives the index of the unit.
244
+
245
+%    Here the BMU is searched for the origin point (from the
246
+%    trained map):
247
+
248
+bmu = som_bmus(sMap,[0 0 0]);
249
+
250
+%    Here the corresponding unit is shown in the figure. You can
251
+%    rotate the figure to see better where the BMU is.
252
+
253
+co = sMap.codebook(bmu,:);
254
+text(co(1),co(2),co(3),'BMU','Fontsize',20)
255
+plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-')
256
+
257
+pause % Strike any key to analyze map quality...
258
+
259
+
260
+
261
+
262
+clc
263
+%    SELF-ORGANIZING MAP QUALITY
264
+%    ===========================
265
+
266
+%    The maps have two primary quality properties:
267
+%      - data representation accuracy
268
+%      - data set topology representation accuracy
269
+
270
+%    The former is usually measured using average quantization error
271
+%    between data vectors and their BMUs on the map.  For the latter
272
+%    several measures have been proposed, e.g. the topographic error
273
+%    measure: percentage of data vectors for which the first- and
274
+%    second-BMUs are not adjacent units.
275
+
276
+%    Both measures have been implemented in the SOM_QUALITY function.
277
+%    Here are the quality measures for the trained map: 
278
+
279
+[q,t] = som_quality(sMap,D)
280
+
281
+%    And here for the initial map:
282
+
283
+[q0,t0] = som_quality(sMap0,D)
284
+
285
+%    As can be seen, by folding the SOM has reduced the average
286
+%    quantization error, but on the other hand the topology
287
+%    representation capability has suffered.  By using a larger final
288
+%    neighborhood radius in the training, the map becomes stiffer and
289
+%    preserves the topology of the data set better.
290
+
291
+
292
+echo off
293
+
294
+