Posted By

webonomic on 12/02/12

Tagged

Versions (?)

Last Edited at 12/02/12 07:43am

Statistics

Viewed 714 times

Favorited by 0 user(s)

Related snippets

Co Word Analysis with SAS

/ Published in: SAS

Text Miner uses a compressed representation of the term-by-doc frequency matrix. You will find an OUT data set in the project data directory of your text miner run. Its label will include the string "OUT" in it. Since a 30,000 document collection will have as many as 500,000 to a million distinct terms, be sure to restrict your terms of interest with a start list. I give an example of creating the cooccurrence matrix with the following code which expands the compressed version to an uncompressed version and then computes the co-occurrence count with proc corr and the sscp option.

Expand | Embed | Plain Text

Copy this code and paste it in your HTML

data myOUT;
input term doc count;
datalines;
1 1 1
1 3 1
1 4 1
2 2 1
2 3 2
3 1 2
3 3 2
3 4 1
4 2 2
4 4 1
5 3 2
;
run;
 
proc sort data=myOUT;
by  doc term;
run;
 
data docbyterm;
set myOUT;
by doc;
array t;
retain t;
if first.doc then do;
   do i=1 to 5;
      t=0;
   end;
end;
t=count;
if last.doc then do;
   output;
end;
run;
 
 
proc corr data=docbyterm cov outp=cooccur sscp;
var t1-t5;
run;

URL: https://communities.sas.com/thread/6327?start=0&tstart=0

Report this snippet Tweet

Comments

Subscribe to comments

Comment:

You need to login to post a comment.