liberating data imprisoned in a vsam gulag

266
Host Systems and Environments
LIBERATING DATA IMPRISONED IN A VSAM GULAG
George P Sharrard, Ph.D.
Burton, Greene, Smolka, & Associates
GETTING STARTED
ABSTRACT
VSAM files fill a well defined and well understood niche in the data processing world.
VSAM files are the place all important corporate
data is sent, never to be seen again! The
dreaded 'data police' will see to it that every bit
of data your marketing, advertising and strategic planning department could ever want is euphemistically speaking - saved. Should you
ever ask for some of it back your name will be
entered into a file (generally called "the systems
enhancement request list') and you will be
monitored in the future. Repeated requests to
get your data back will only produce blank
stares and reference to some 'COPYUB" somewhere or, worse yet, to a particular CICS
screen.
Some important things you need to be aware of
before you enter the world of VSAM.
1.
VSAM is an indexed file structure that
allows you to access specific records without having to read through the entire file.
Oust like an index On a SAS dataset)
2.
key
3.
There can be more than one key on a file sometimes called the alternate index or
secondary key.
4.
The VSAM files you want may be 'owned'
by an on-line system. CICS or a specific
application (M&D or CA are well known
companies whose applications read/write
VSAM files) may have an exclusive lock on
the files your data is in. If this is true, you
can only get at the data when the on-line
is down (or the file you want is 'closed' to
the on-line).
5.
There are different kinds of VSAM files. The
SAS guide to VSAM processing version 5
edition (pub number 5605) chapter 2
'Overview of VSAM Concepts' has a very
readable discussion of the different types
of VSAM files.
6.
world,
In
COBOL
the
the
codebook!machine readable dictionary file
description for a VSAM file is called a
copybook. Copybooks generally are saved
in a PDS called COPYUB. While you may
think that the existence of a codebook will
Somewhere along the way you will be interrogated as to why the hundreds of standard
reports the current system provides will not
meet your needs. (Because it's the wrong
information is not an acceptable reason.)
Don't give up! SAS can set your data free!!!
This paper will present three concrete examples
of how SAS can access VSAM files:
Sequential read
Keyed read
Keyed sequential read
Major points as to what to look for in a
COPYBOOK will also be presented.
NESUG '92 Proceedings
= index
Host
make your search for data easier - beware!! Copybooks are the stuff nightmares
are made of.
7.
There is nothing magic about VSAM files.
After all, you usually find them in COBOL
shops.
3.
Syst~ms
and Environments
267
The only difficulty is to figure out the offset
(column location) of the data elements you
are interested in. For the sake of this
example let's say you have this information
(it was in the manual).
The following code is all you need.
SEQUENTIAL READ OF VSAM FILE USING SAS
We will now look, in detail, at three very useful
methods of getting at data held in a VSAM
structure.
Situation 1
You work for a multinational pharmaceutical
company. 10 years ago, they bought a canned
application to do AP (accounts payable). This
package can generate reports based on predefined fields (defined 10 years ago) and
preform simple sorts. A new type of report is
needed. The current staff only knows how to
run the canned reports and it is suggested that
what you are requesting is a 'major systems
enhancement' Oots of money - take a long
time).
With an 'enhanced' level of incredulity, you
request the file name(s) stating that you will do
the report yourself using SAS. The manager of
the AP department is at a lose. Authority has
been challenged, ignorance exposed. Casting,
what is hoped will be a trump card, the AP
manager states - 'yes, but its a VSAM file allocated exclusive to M940.'
FILENAME EXPNSIN 'LlVE.PRJV.EXP'
DATA EXPNS;
INFILE EXPNSIN VSAM MISSOVER
LRECL=200 LlNESIZE=200;
INPUT
FIRMNUM
1- 4
INVOICE
5-9
$ 10- 16
RECNUM
RECNUM2
14- 16
$ 17
BORS
$
18- 37
DESC
SKU
$ 52- 59
@62 PPUNIT
PD7.2
@75 TOTUNITS PD6.0
On the infile statement it is necessary to specify
VSAM. If you omit the VSAM option, the MISSOVER option will not work. As it is reasonable
to assume that you will encounter variable
length records when reading VSAM files it is
imperative that you have access to the MISSOVER option.
Situation 2
so....
1. Find out when M940 is brought down.
Usually, this happens after all the data
entry personnel go home for the night.
2.
You need every record in the AP file and
don't care about Its keys. A simple sequential read will do it.
You work at a mail order clothing company.
Your marketing department conducted a telephone survey with 1,000 current customers and
500 former customers asking about everything
except the specifics of past purchases. After all,
that's on the company's customer database.
Why use up valuable telephone time asking
about purchase information that we already
NESUG '92 Proceedings
268
Host Systems and Environments
store in the purchase history file? (a file where
each customer has one record with number of
orders and $ amount spent for each of the 16
clothing groups in our catalog.
PUT
RETURN;
END;
END;
Here's what you have:
1.
2.
The survey data has been entered and
saved as a SAS file where the variable
CUSTNUM (numeric length 9) is the unique
customer identifier on the survey and customer database.
Your
customer
database.
Its
huge
3,000,000+ records, so you do not want to
merge the survey file with the customer file
(as in):
data both;
merge survey Qn=surin) housefil ;
by custnum;
if surin ;
Consider the following:
KEYED VSAM READ USING SAS
LlBNAME SURVEY 'MKT001.GEORGE.SUR92';
FILENAME HOUSEFIL 'UVE.PRJV.HSF' ;
DATA BOTH;
SET SURVEY;
KEYVAR=CUSTNUM ;
INFILE HOUSEFIL VSAM KEY=KEYVAR
FEEDBACK=SASRC
INPUT
ORDERS 19-21
@25 (MTYPE1 - MTYPE16) (6.2)
IF SASRC = 4 OR SASRC = 16 THEN DO;
ERROR_=O ;
IF SASRC=4 THEN STOP;
NESUG '92 proceedings
ELSE DO;
SASRC=O;
'NO RECORD WITH KEY= ' KEYVAR
RUN;
KEY=KEYVAR, this is how we make use of the
index on the data file. Instead of reading each
record sequentially and testing to see if it is
one that we want. we will go directly to each
record that is a match for a customer in our
survey. We avoid doing 3,000,000 reads and
3,000,000 customer number compares. To do a
keyed lookup on 1,500 survey respondents
takes 1,500 read.
FEEDBACK=SASRC, we make a new variable
(SASRC) and set it equal to the value of
(FEEDBACK) the system generated return code.
SASRC's value is set after each lookup.
If SASRC = 4 or 16, this is very bad! 4 is
terminal and you have to STOP. 16 means your
lookup failed as the key you used was not
found. In either case, reset ERROR_ to 0 so
you can continue.
If SASRC .NOT. 4 or 16, you have your data.
Situation 3
While the above code will work for a lookup
where each survey matches only one record in
the purchase history file, what to do when we
need to look at each order over the past 2
years (up to 9 orders for any single customer)
to see the duration of time between orders. We
need to retrieve DOF-ORD-DT for each order in
the detailed order file for the 1,000 surveyed
customers and SOD surveyed former customers.
The way to get to the DETAIL-ORDER-FILE is to
Host Systems and Environments
use the CUSTNUM to read the HOUSE-FILE. In
the HOUSE-FILE there is a variable SEGMENTCODE that, when concatenated to the CUSTNUM, creates a partial key to the DETAILORDER-FILE.
KEYED SEQUENTIAL VSAM READ
LlBNAME SURVEY 'MKT01.JIM.SUR92';
FILENAME HOUSEFIL 'UVE.PRJV.HSF';
FILENAME PURHIST 'UVE.PRJV.PURH';
DATA BOTH;
SET SURVEY;
KEYVAR=CUSTNUM ;
INFILE HOUSEFIL VSAM KEY=KEYVAR
FEEDBACK=SASRC ;
INPUT SEGCODE $ 19-23;
IF SASRC = 4 OR SASRC = 16 THEN DO;
ERROR_=O;
IF SASRC=4 THEN STOP;
ELSE DO;
SASRC=O;
PUT 'NO RECORD WITH KEY= ' KEYVAR;
SEGCODE='XXXXX' ;
RETURN;
END;
END;
KEYCUS=CUSTNUM I I SEGCODE ;
INFILE PURHIST VSAM KEY=KEYCUS
GENKEY SKIP FEEDBACK=SASRC;
FORMAT KEYCUS2 $ 14.;
INPUT
VARKEY $ 41-54
SEQNUM
55-56
ORDDATE $ 227-232;
IF SASRC = 4 OR SASRC = 16 THEN DO;
ERROR_=O;
IF SASRC=4 THEN STOP;
ELSE DO;
SASRC=O;
PUT 'NO RECORD WITH KEY= ' KEYCUS ;
RETURN;
END;
269
END;
KEYCUS2=VARKEY ;
DO WHILE (KEYCUS2=VARKEY) ;
SELECT (SEQNUM) ;
WHEN(1)
ORD1 =ORDDATE ;
WHEN (2)
ORD2=ORDDATE ;
WHEN (3)
ORD3=ORDDATE ;
WHEN (4)
ORD4=ORDDATE;
WHEN (5)
ORD5=ORDDATE;
WHEN (6)
ORD6=ORDDATE;
WHEN(7)
ORD7=ORDDATE;
WHEN(8)
ORD8=ORDDATE;
WHEN (9)
ORD9=ORDDATE ;
END;
INPUT
KEYCUS2 $ 1-14
SEQNUM 55-56
ORDDATE $ 227-232
IF SASRC = 4 OR SASRC = 16 THEN DO ;
ERROR_=O ;
IF SASRC=4 THEN STOP;
ELSE DO;
SASRC=O;
PUT 'NO RECORD WITH KEY= '
KEYCUS2 ;
RETURN;
END;
END;
END;
RUN;
Here we have 2 keyed lookups. First a quick
stop at the housefile to get SEGMENT-CODE.
SEGCODE is concatenated to CUSTNUM to
form a 'partial key" to the DETAIL-ORDER-FILE
(DOF). The full key to the DOF is:
CUSTNUMI ISEGNUMI ISEQNUM
where SEQNUM takes values 01 - 09.
GENKEY is specified so as to allow SAS to use
the partial key to find the first matching record
in the DOF.
SKIP tells SAS to stop using direct keyed reads
and now process sequentially.
NESUG '92 Proceedings
Host Systems and Environments
270
We continue to read records until a new partial
key value is encountered (that's the DO WHILE
part).
NEW THINK AND DOUBLE SPEAK
THE LANGUAGE OF THE COBOL COPYBOOK
COBOL boosters like to say that COBOL code
is self documenting and that clear/accurate data
definition is built into the copybook structure.
Here are two common copybook 'features' that
can confuse and mislead.
ONE DATA FIELD - MANY MEANINGS
05
UMX-PAYMENT-ELEMENT-1
05
UMX-ORDER-PAYMENT-1 REDEFINES
UMX-PAYMENT-ELEMENT-1.
10
10
10
UMX-PAYMENT-SEC-CNTRY-1 PIC XX.
UMX-PAYMENT-SEC-1
PIC X(9).
UMX-PAYMENT-SEC-CHK-1
PICX.
05
UMX-CASH-PAYMENT-1 REDEFINES
UMX-PAYMENT-ELEMENT-1.
UMX-PAY-CURRENCY-CODE-1 PIC xxx.
UMX-PAY-CURRENCY-FILL-1 PIC X(9).
UMX-PAY-IS-CASH-1
VALUE' CURRENCY'.
10
10
88
PIC X(12).
Each 'REDEFINES' is reading the same data
but giving it different meanings. It would be
incorrect to count each 'field name' when
establishing the offset (range of columns) for
each of the variable locations.
The numbers at the far left Of each line are
called level numbers. They represent the hierarchy of the data structure being defined. That is,
the 05 level is made up of all the level 10's that
come under it - and before the next 05. Level
10s can be subdivided into level 158 - 158 into
NESUG 192 Proceedings
208.....
In the above example, there are only 12 columns being described. These 12 columns may
be referenced as one 12 byte field (UMXPAYMENT-ELEMENT-1), as 2 bytes - 9 bytes - 1
byte (first redefines), or as 3 bytes - 9 bytes
(second redefines). BUT, there are only 12
bytes being described. It is as if you wrote your
SAS INPUT statement to keep rereading the
same columns over and over again.
Most computer systems using VSAM files have
some editor which allows you to edit/browse a
VSAM file. FILE-AID is a well known and
extremely useful tool for doing this on an IBM
mainframe. Using FILE-AID and the 'COL' line
command you can often figure out your offsets
without looking at the COPYBOOK (If the fields
are obvious). Or 'MAP' the file, i.e. edit the data
telling FILE-AID the name of the COPYBOOK
This gives you a PROC FSEDIT like view of the
file. Using the MAP option, position field you are
interested in at the top of the screen, and read
the column indicator (usually in the upper right
corner). If you are really adventuresome, FILEAID option 3.8 will allow you to 'compile' the
map. Once compiled the starting and ending
columns for each field are displayed.
ITS NOT WHAT YOU THINK
************-***************************
** UTABLEX - COPYBOOK FOR INDEX
** ENTRY TABLE - START
****************************************
01
05
*
10
*
10
*
IXE-INDEX-ENTRY.
IXE-KEY.
FIRM NUMBER
IXE-ID-FIRM
PIC S9(04) USAGE COMPo
OFFICE NUMBER
PIC X(06).
IXE-ID-OFC
TABLE NAME (LOGICAL FILE
Host Systems and Environments
NAME)
10 XE-NM-TBL-INDX
PIC X(16).
,.
ENTRY NAME (LOW VALUES FOR
,.
CONTROL ENTRY)
10
IXE-NM-ENTR
05
IXE-DATA.
10
FILLER
,.
,.
PIC X(16).
PIC X(06).
When one check the field IXE-NM-ENTR (use
FILE-AID with MAP=ON)
IXE-NM-ENTR 16/AN
X'01 F54040404040404040404040404040'
This definitely is not what is commonly described as a 'character" field. This is some type
of binary stuff - which you cannot read with a $
INFORMAT.
It turns out that this is a 'general purpose' file
where each field means many things - depending on the value of certain other fields. For the
data I wanted, I was to read only the first two
bytes of IXE-NM-ENTR (HEX 01 F5) and I was
told that these two bytes were 'signed binary'!!!??? (turned out to be PIB2)
COPYBOOK field descriptions must be verified.
Just because this is listed as a PIC X(16) does
not mean that the field contains standard
printable characters.
271
COMPutations on these fiJeds). But this takes
extra effort to read in SAS.
First, the INFORMAT. In the IBM 370 world this
would be read as:
PD8.10
You get 8.10 by adding the (5) + (10). This
makes 15, add one for the sign, makes 16. (If
adding 1 for the Sign made an odd number,
add one more). Divide the 16 by 2 (as a
packed field stores 2 numbers in one 'column')
and we find the field to be 8 columns wide. The
V9 part is the implied decimal.
REFERENCES
SAS Guide To VSAM Processing
Version 5 Edition
SAS Institute, Cary, NC 1985
CONTACT INFORMATION
George P Sharrard
Burton, Greene, Smolka & Associates
14 Sunwich Road
Rowayton, CT 06853
E-Mail [email protected]
COMP-3
05 TOT-COST PIC S9(5)V9(10) COMP-3
COMP-3 is used in order to save big numbers
in small spaces (also has a number of advantages if you are going to do mathematical
NESUG '92 Proceedings