Code:
- XML-SAX
-
- I am using the CD Catalog XML file from http://www.w3schools.com/xml/xml_examples.asp.
-
- I copied the file cd_catalog.xml to /home/cpando
-
- CP1350R takes the XML file from the IFS and produces a file with the each data item and its corresponding unique tags.
-
- e.g.
-
- <CATALOG>
- <CD>
- <TITLE>Empire Burlesque</TITLE>
- <ARTIST>Bob Dylan</ARTIST>
- <COUNTRY>USA</COUNTRY>
- <COMPANY>Columbia</COMPANY>
- <PRICE>10.90</PRICE>
- <YEAR>1985</YEAR>
- </CD>
- </CATALOG>
-
- is converted to:
-
- CATALOG/CD/TITLE Empire Burlesque
- CATALOG/CD/ARTIST Bob Dylan
- CATALOG/CD/COUNTRY USA
- CATALOG/CD/COMPANY Columbia
- CATALOG/CD/PRICE 10.90
- CATALOG/CD/YEAR 1985
- CATALOG/CD
-
-
- Each data item is associated with a unique combination of tags (henceforth called a hash).
- The last record (CATALOG/CD) could have been gone first, but I use it to trigger a level break.
- The whole data base is CATALOG, and the file is CD (here named CP1351F); a hash w/out a data item
- value is the trigger (event) to write a database record.
-
- The call to CP1350R looks like:
-
- CALL PGM(CP1350R) PARM('/home/cpando/cd_catalog.xml ')
-
- There is one executable line of code in the mainline:
-
- XML-SAX %Handler(handler:handlerInfo)
- %XML(%Trim(xmlDocument) : 'ccsid=37 ' +
- 'doc=file ');
-
- My control structure contains a stack (to contain the tags), and end of stack index,
- and a 'used' flag to show whether the element contained elements [1];
- (e.g. <TITLE></TITLE>, there is no point in writing the tags out, because there is no data
- item with which to associate it).
-
- We recognize 4 (four) events (XML-SAX is an event driven parser):
-
- XML_START_DOCUMENT
- set stack index (this is not necessary)
-
- XML_START_ELEMENT
- increment stack index, and append new element to the hash of the parent
-
- XML_CHARS
- this is the data item; strip blanks and CR/LF, and write with hash.
-
- XML_END_ELEMENT
-
- Given the file above, writing the program to generate a database record is relatively trivial.
-
- CP1351R is a greatly simplified version of the production code. It has 7 unique hashes,
- whereas the production version (bringing in a whole (minor) database) has 57. Extensive use is made
- of pointer-based procedures. If the hash has an associated data item, then the appropriate routine
- is called to scrub the data item and load it into the data base field. If there is no associated data item
- for the hash (level break) a record is written. If the hash does not exist in the lookup table (someone slipped in a
- new data item) then an exception is written to QSYSPRT. In the example here, we are not interested in the
- data items corresponding to the hash CATALOG/CD/YEAR, so we simply throw them away (instead of having
- a scrub/load procedure, it just maps the procedure pointer (procProxy@) to *NULL). I could've just as easily
- created a CD_YEAR procedure that did nothing. Six of one.
-
-
- <DIGRESSION> In production, the vast majority of time was (per Performance Explorer) spent on the lookup of
- the procProxy pointer:
-
- procProxy@ = @procProxy@(%LookUp(dataItemHash:tags));
-
- If we examine the table against which we are looking:
- CATALOG/CD
- CATALOG/CD/TITLE
- CATALOG/CD/ARTIST
- CATALOG/CD/COUNTRY
- CATALOG/CD/COMPANY
- CATALOG/CD/PRICE
- CATALOG/CD/YEAR
-
- we see that the first 10 characters of each entry are identical (in the production program, the first
- thirty characters were identical). That makes for a slow lookup.
- I could write a custom lookup function (procedure) that started from the right[2], or ...
-
-
- by changing the line in CP1350R (84.00)
-
- tag.stack(tag.stack$) =
- %Trim(tag.stack(tag.stack$-1)) + '/' + %SubSt(dta:1:dtaLen);
-
- to
-
- tag.stack(tag.stack$) =
- %SubSt(dta:1:dtaLen) + '/' + %Trim(tag.stack(tag.stack$-1));
-
- we reverse the order of the tags, meaning we can use the lookup table:
-
- CD/CATALOG
- TITLE/CD/CATALOG
- ARTIST/CD/CATALOG
- COUNTRY/CD/CATALOG
- COMPANY/CD/CATALOG
- PRICE/CD/CATALOG
- YEAR/CD/CATALOG
-
- Significant performance gain, but it obfuscates the code a little bit, so I left it out.
- </DIGRESSION>
-
- After running CP1351R, we get (using RUNQRY to display CP1351F):
-
- ARTIST TITLE COUNTRY COMPANY PRICE
- Bob Dylan Empire Burlesque USA Columbia 10.90
-
-
-
- Conclusion: I could not, for the life of me, figure out how the XML-SAX opcode worked from the documentation.
- So I just put together a simple program, threw it into debug, and started examining the data structures. I
- was really surprised how simple it turned out to be once I understood how it worked. To best understand
- how CP1350R works, throw it into debug, and examine the data structures. It's really pretty simple.
-
- <DIGRESSION>One of the most confusing things about XML-SAX is communication Area on the %Handler built-in. I'm
- not entirely sure I know what they are doing here, but the point seems to be able to side-step scope. I suspect
- it was going to be an API, and then at the last moment was implemented as an RPG %BIF. Anyways, I created
- CP1350R1 which doesn't use the communication Area. The declaration of the handling procedure now looks like:
-
- handler pr 10i 0
- 1a
- 10i 0 Value
- * Value
- 20i 0 Value
- 10i 0 Value
-
- and the d specs of the procedure now look like:
-
- p handler b
- d pi 10i 0
- d ignore 1a
- d Event 10i 0 Value
- d dta@ * Value
- d dtaLen 20i 0 Value
- d exceptionID 10i 0 Value
- d dta s 1024a Based(dta@)
-
- d stack s 60a Dim(20) Static
- d stack$ s 3s 0 Static
- d used s n Dim(20) Static
-
- with the definition of the static variables replacing the data structure:
-
- d handlerInfo ds
- d stack 60a Dim(20)
- d stack$ 3s 0
- d used n Dim(20)
-
- This works just as well, and is (because the control variables are no longer part of
- a qualified data structure) easier to read.
- </DIGRESSION>
-
- [1] an element which contains elements is a file - let's talk about this
-
- Imagine a catalog database with two files, a cd master and an artist master.
-
- <CATALOG>
- <CD>
- <TITLE>Empire Burlesque</TITLE> <ARTISTID>123456</ARTISTID> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE
- </CD>
- <ARTISTMAST>
- <ARTISTID>123456</ARTISTID><ARTIST>Bob Dylan</ARTIST>
- </ARTISTMAST>
- </CATALOG>
-
- Our generated file would look like:
-
- CATALOG/CD/TITLE Empire Burlesque
- CATALOG/CD/ARTISTID 123456
- CATALOG/CD/COUNTRY USA
- CATALOG/CD/COMPANY Columbia
- CATALOG/CD/PRICE 10.90
- CATALOG/CD/YEAR 1985
- CATALOG/CD
- CATALOG/ARTISTMAST/ARTISTID 123456
- CATALOG/ARTISTMAST/ARTIST Bob Dylan
- CATALOG/ARTISTMAST
-
- CD and ARTISTMAST are files, they are elements that contain elements. Those contained elements constitute
- the fields in a record, and the containing element is the file. An element that contains elements does not
- (in this example) itself have any associated data elements. This can be nested as deeply as needed.
-
-
- [2] or I could reverse the order of the lookup argument, and use the table
- DC/GOLATAC
- ELTIT/DC/GOLATAC
- TSITRA/DC/GOLATAC
- YRTNUOC/DC/GOLATAC
- YNAPMOC/DC/GOLATAC
- ECIRP/DC/GOLATAC
- RAEY/DC/GOLATAC
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // //
- // XML Extract //
- // //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Ctl-Opt dftActGrp(*No) actGrp(*Caller)
- debug(*Yes) option(*SrcStmt:*NoDebugIO)
- Main(cp1350r) ;
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // ... files ... //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-F cp1350f Disk(250) Usage(*Output) UsrOpn ;
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // ... data structures ... //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-DS handlerInfo ;
- @stack Char(60) Dim(20) ;
- @used Ind Dim(20) ;
- stack$ Zoned(3:0) ;
- End-DS ;
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // Procedures //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- // Mainline //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cp1350r ;
- Dcl-PI *n ExtPgm ;
- xmlDocument Char(80) ;
- End-PI ;
-
- init() ;
- XML-SAX %Handler(handler:handlerInfo)
- %XML(%Trim(xmlDocument) : 'ccsid=37 ' +
- 'doc=file ') ;
- eoj() ;
-
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- // Handler //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc handler ;
- Dcl-PI *n Int(10) ;
- tag LikeDS(handlerInfo) ;
- event Int(10) Value ;
- dta@ Pointer Value ;
- dtaLen Int(20) Value ;
- exceptionID Int(10) Value ;
- End-PI ;
-
- Dcl-DS cp1350fds Len(250) End-DS ;
- Dcl-S dataItemVal Char(25) ;
- Dcl-S dta Char(1024) Based(dta@) ;
-
- Select ;
-
- When ( event = *XML_START_DOCUMENT ) ;
- tag.stack$ = 0 ;
-
- When ( event = *XML_START_ELEMENT ) ;
- tag.stack$ += 1 ;
- tag.@used = *Off ;
- If ( tag.stack$ = 1 ) ;
- tag.@stack(tag.stack$) = %SubSt(dta:1:dtaLen) ;
- Else ;
- tag.@stack(tag.stack$) =
- %Trim(tag.@stack(tag.stack$-1)) + '/' + %SubSt(dta:1:dtaLen) ;
- EndIf ;
-
- When ( event = *XML_CHARS ) ;
- dataItemVal = %Trim(%SubSt(dta:1:dtaLen):X'400D25') ; // magic
- If ( dataItemVal <> *Blanks ) ;
- cp1350fds = tag.@stack(tag.stack$) + dataItemVal ;
- tag.@used(tag.stack$-1) = *On ;
- Write cp1350f cp1350fds ;
- EndIf ;
-
- When ( event = *XML_END_ELEMENT ) ;
- If ( tag.@used(tag.stack$) ) ;
- cp1350fds = tag.@stack(tag.stack$) ;
- Write cp1350f cp1350fds ;
- EndIf ;
- tag.stack$ -= 1 ;
- EndSl ;
-
- Return 0 ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc init ;
- Open cp1350f ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc eoj ;
- Close cp1350f ;
- Return ;
- End-Proc ;
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
- * *
- * CD Catalog *
- * *
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
- A R CP1351
- A ARTIST 30A
- A TITLE 30A
- A COUNTRY 10A
- A COMPANY 10A
- A PRICE 5S 2
- A K ARTIST
- A K TITLE
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // //
- // Build Catalog From Intermediate File //
- // //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Ctl-Opt dftActGrp(*No) actGrp(*Caller)
- debug(*Yes) option(*SrcStmt:*NoDebugIO)
- Main(cp1351r) ;
- //--------------------------------------------------------------------------------------------------------------//
- // //
- //... files ... //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-F cp1350f Disk(250) InfDS(InfDB) UsrOpn ;
- Dcl-F cp1351f Usage(*Output) UsrOpn ;
- Dcl-F qsysprt Printer(225) Usage(*Output) UsrOpn ;
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // ... global variables ... //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-DS cp1350fds Len(250) ;
- dataItemHash Char(60) Pos(1) ;
- dataItemVal Char(100) Pos(61) ;
- End-DS ;
- Dcl-DS InfDB ;
- DBrrn Int(10) Pos(397) ;
- End-DS ;
- Dcl-S tags Char(60) Dim(8) CtData ;
- oqsysprt e 1
- o dataItemHash
- o dataItemVal
- o DBrrn z
- //--------------------------------------------------------------------------------------------------------------//
- // //
- // Procedures //
- // //
- //--------------------------------------------------------------------------------------------------------------//
- // Mainline //
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cp1351r ;
-
- Dcl-PR procProxy ExtProc(procProxy@) End-PR ;
- Dcl-S szProcProxy Zoned(3:0) Inz(%Elem(@procProxy@)) ;
- Dcl-S procProxy@ Pointer(*Proc) ;
-
- Dcl-DS *n ;
- *n Pointer(*Proc) Inz(%PAddr('CD')) ; // 1
- *n Pointer(*Proc) Inz(%PAddr('CD_TITLE')) ; // 2
- *n Pointer(*Proc) Inz(%PAddr('CD_ARTIST')) ; // 3
- *n Pointer(*Proc) Inz(%PAddr('CD_COUNTRY')) ; // 4
- *n Pointer(*Proc) Inz(%PAddr('CD_COMPANY')) ; // 5
- *n Pointer(*Proc) Inz(%PAddr('CD_PRICE')) ; // 6
- *n Pointer(*Proc) Inz(*Null) ; // 7 CD_YEAR - ignored
- *n Pointer(*Proc) Inz(%PAddr('NOTFOUND')) ; // 8 ... always last
- @ProcProxy@ Pointer(*Proc) Dim(8) Pos(1) ;
- End-DS ;
-
- init() ;
- DoW ( reader() ) ;
-
- tags(szProcProxy) = dataItemHash ;
- procProxy@ = @procProxy@(%LookUp(dataItemHash:tags)) ;
- If ( procProxy@ <> *Null ) ;
- procProxy() ;
- EndIf ;
-
- EndDo ;
- eoj() ;
-
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd ;
- Write cp1351 ;
- Clear cp1351 ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd_title ;
- TITLE = %Trim(DataItemVal) ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd_artist ;
- ARTIST = %Trim(dataItemVal) ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd_country ;
- COUNTRY = %Trim(dataItemVal) ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd_company ;
- COMPANY = %Trim(dataItemVal) ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc cd_price ;
- PRICE = %Dec(%Trim(dataItemVal):5:2) ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc notFound ;
- Except ;
- Return ;
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc reader ;
- Dcl-PI *n Ind End-PI ;
-
- Read cp1350f cp1350fds ;
- Return ( Not %Eof(cp1350f) ) ;
-
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc init ;
-
- Open cp1351f ;
- Open cp1350f ;
- Open qsysprt ;
- Return ;
-
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- Dcl-Proc eoj ;
-
- Close cp1351f ;
- Close cp1350f ;
- Close qsysprt ;
- Return ;
-
- End-Proc ;
- //--------------------------------------------------------------------------------------------------------------//
- ** 456789012345678901234567890123456789012345678901234567890
- CATALOG/CD 001
- CATALOG/CD/TITLE 002
- CATALOG/CD/ARTIST 003
- CATALOG/CD/COUNTRY 004
- CATALOG/CD/COMPANY 005
- CATALOG/CD/PRICE 006
- CATALOG/CD/YEAR 007 not used - maps to *NULL
- tags not found must be last
-
|
|