midrange.com code scratchpad
Name:
CP1350 - XML
Scriptlanguage:
Plain Text
Tabwidth:
4
Date:
05/17/2016 09:06:49 pm
IP:
Logged
Description:
An example of the XML-SAX op code, used to import databases in XML format
Code:
  1. XML-SAX
  2.  
  3. I am using the CD Catalog XML file from http://www.w3schools.com/xml/xml_examples.asp.
  4.  
  5. I copied the file cd_catalog.xml to /home/cpando
  6.  
  7. CP1350R takes the XML file from the IFS and produces a file with the each data item and its corresponding unique tags.
  8.  
  9. e.g.
  10.  
  11. <CATALOG>
  12.   <CD>
  13.     <TITLE>Empire Burlesque</TITLE>
  14.     <ARTIST>Bob Dylan</ARTIST>
  15.     <COUNTRY>USA</COUNTRY>
  16.     <COMPANY>Columbia</COMPANY>
  17.     <PRICE>10.90</PRICE>
  18.     <YEAR>1985</YEAR>
  19.   </CD>
  20. </CATALOG>
  21.  
  22. is converted to:
  23.  
  24. CATALOG/CD/TITLE                                            Empire Burlesque
  25. CATALOG/CD/ARTIST                                           Bob Dylan
  26. CATALOG/CD/COUNTRY                                          USA
  27. CATALOG/CD/COMPANY                                          Columbia
  28. CATALOG/CD/PRICE                                            10.90
  29. CATALOG/CD/YEAR                                             1985
  30. CATALOG/CD
  31.  
  32.  
  33. Each data item is associated with a unique combination of tags (henceforth called a hash).
  34. The last record (CATALOG/CD) could have been gone first, but I use it to trigger a level break.
  35. The whole data base is CATALOG, and the file is CD (here named CP1351F); a hash w/out a data item
  36. value is the trigger (event) to write a database record.
  37.  
  38. The call to CP1350R looks like:
  39.  
  40. CALL PGM(CP1350R) PARM('/home/cpando/cd_catalog.xml                                                     ')
  41.  
  42. There is one executable line of code in the mainline:
  43.  
  44. XML-SAX  %Handler(handler:handlerInfo)
  45.          %XML(%Trim(xmlDocument) : 'ccsid=37 ' +
  46.                                    'doc=file ');
  47.  
  48. My control structure contains a stack (to contain the tags), and end of stack index,
  49. and a 'used' flag to show whether the element contained elements [1];
  50. (e.g. <TITLE></TITLE>, there is no point in writing the tags out, because there is no data
  51. item with which to associate it).
  52.  
  53. We recognize 4 (four) events (XML-SAX is an event driven parser):
  54.  
  55. XML_START_DOCUMENT
  56.   set stack index (this is not necessary)
  57.  
  58. XML_START_ELEMENT
  59.   increment stack index, and append new element to the hash of the parent
  60.  
  61. XML_CHARS
  62.   this is the data item; strip blanks and CR/LF, and write with hash.
  63.  
  64. XML_END_ELEMENT
  65.  
  66. Given the file above, writing the program to generate a database record is relatively trivial.
  67.  
  68. CP1351R is a greatly simplified version of the production code. It has 7 unique hashes,
  69. whereas the production version (bringing in a whole (minor) database) has 57. Extensive use is made
  70. of pointer-based procedures. If the hash has an associated data item, then the appropriate routine
  71. is called to scrub the data item and load it into the data base field. If there is no associated data item
  72. for the hash (level break) a record is written. If the hash does not exist in the lookup table (someone slipped in a
  73. new data item) then an exception is written to QSYSPRT. In the example here, we are not interested in the
  74. data items corresponding to the hash CATALOG/CD/YEAR, so we simply throw them away (instead of having
  75. a scrub/load procedure, it just maps the procedure pointer (procProxy@) to *NULL). I could've just as easily
  76. created a CD_YEAR procedure that did nothing. Six of one.
  77.  
  78.  
  79. <DIGRESSION> In production, the vast majority of time was (per Performance Explorer) spent on the lookup of
  80. the procProxy pointer:
  81.  
  82. procProxy@ = @procProxy@(%LookUp(dataItemHash:tags));
  83.  
  84. If we examine the table against which we are looking:
  85. CATALOG/CD
  86. CATALOG/CD/TITLE
  87. CATALOG/CD/ARTIST
  88. CATALOG/CD/COUNTRY
  89. CATALOG/CD/COMPANY
  90. CATALOG/CD/PRICE
  91. CATALOG/CD/YEAR
  92.  
  93. we see that the first 10 characters of each entry are identical (in the production program, the first
  94. thirty characters were identical). That makes for a slow lookup.
  95. I could write a custom lookup function (procedure) that started from the right[2], or ...
  96.  
  97.  
  98. by changing the line in CP1350R (84.00)
  99.  
  100. tag.stack(tag.stack$) =
  101.   %Trim(tag.stack(tag.stack$-1)) + '/' + %SubSt(dta:1:dtaLen);
  102.  
  103. to
  104.  
  105. tag.stack(tag.stack$) =
  106.   %SubSt(dta:1:dtaLen) + '/' + %Trim(tag.stack(tag.stack$-1));
  107.  
  108. we reverse the order of the tags, meaning we can use the lookup table:
  109.  
  110. CD/CATALOG
  111. TITLE/CD/CATALOG
  112. ARTIST/CD/CATALOG
  113. COUNTRY/CD/CATALOG
  114. COMPANY/CD/CATALOG
  115. PRICE/CD/CATALOG
  116. YEAR/CD/CATALOG
  117.  
  118. Significant performance gain, but it obfuscates the code a little bit, so I left it out.
  119. </DIGRESSION>
  120.  
  121. After running CP1351R, we get (using RUNQRY to display CP1351F):
  122.  
  123.  ARTIST                          TITLE                           COUNTRY    COMPANY   PRICE
  124.  Bob Dylan                       Empire Burlesque                USA        Columbia  10.90
  125.  
  126.  
  127.  
  128. Conclusion: I could not, for the life of me, figure out how the XML-SAX opcode worked from the documentation.
  129. So I just put together a simple program, threw it into debug, and started examining the data structures. I
  130. was really surprised how simple it turned out to be once I understood how it worked. To best understand
  131. how CP1350R works, throw it into debug, and examine the data structures. It's really pretty simple.
  132.  
  133. <DIGRESSION>One of the most confusing things about XML-SAX is communication Area on the %Handler built-in. I'm
  134. not entirely sure I know what they are doing here, but the point seems to be able to side-step scope. I suspect
  135. it was going to be an API, and then at the last moment was implemented as an RPG %BIF. Anyways, I created
  136. CP1350R1 which doesn't use the communication Area. The declaration of the handling procedure now looks like:
  137.  
  138. handler         pr            10i 0          
  139.                                1a
  140.                               10i 0 Value
  141.                                 *   Value
  142.                               20i 0 Value
  143.                               10i 0 Value
  144.  
  145. and the d specs of the procedure now look like:
  146.  
  147. p handler         b
  148. d                 pi            10i 0
  149. d  ignore                        1a
  150. d  Event                        10i 0 Value
  151. d  dta@                           *   Value
  152. d  dtaLen                       20i 0 Value
  153. d  exceptionID                  10i 0 Value
  154. d  dta            s           1024a   Based(dta@)
  155.  
  156. d stack           s             60a   Dim(20) Static
  157. d stack$          s              3s 0         Static
  158. d used            s               n   Dim(20) Static
  159.  
  160. with the definition of the static variables replacing the data structure:
  161.  
  162. d handlerInfo     ds
  163. d  stack                        60a   Dim(20)
  164. d  stack$                        3s 0
  165. d  used                           n   Dim(20)
  166.  
  167. This works just as well, and is (because the control variables are no longer part of
  168. a qualified data structure) easier to read.
  169. </DIGRESSION>
  170.  
  171. [1] an element which contains elements is a file - let's talk about this
  172.  
  173. Imagine a catalog database with two files, a cd master and an artist master.
  174.  
  175. <CATALOG>
  176.  <CD>
  177.    <TITLE>Empire Burlesque</TITLE> <ARTISTID>123456</ARTISTID> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE
  178.  </CD>
  179.  <ARTISTMAST>
  180.   <ARTISTID>123456</ARTISTID><ARTIST>Bob Dylan</ARTIST>
  181.  </ARTISTMAST>
  182. </CATALOG>
  183.  
  184. Our generated file would look like:
  185.  
  186. CATALOG/CD/TITLE                                            Empire Burlesque
  187. CATALOG/CD/ARTISTID                                         123456
  188. CATALOG/CD/COUNTRY                                          USA
  189. CATALOG/CD/COMPANY                                          Columbia
  190. CATALOG/CD/PRICE                                            10.90
  191. CATALOG/CD/YEAR                                             1985
  192. CATALOG/CD
  193. CATALOG/ARTISTMAST/ARTISTID                                 123456
  194. CATALOG/ARTISTMAST/ARTIST                                   Bob Dylan
  195. CATALOG/ARTISTMAST
  196.  
  197. CD and ARTISTMAST are files, they are elements that contain elements. Those contained elements constitute
  198. the fields in a record, and the containing element is the file. An element that contains elements does not
  199. (in this example) itself have any associated data elements. This can be nested as deeply as needed.
  200.  
  201.  
  202. [2] or I could reverse the order of the lookup argument, and use the table
  203. DC/GOLATAC
  204. ELTIT/DC/GOLATAC
  205. TSITRA/DC/GOLATAC
  206. YRTNUOC/DC/GOLATAC
  207. YNAPMOC/DC/GOLATAC
  208. ECIRP/DC/GOLATAC
  209. RAEY/DC/GOLATAC
  210.       //--------------------------------------------------------------------------------------------------------------//
  211.       //                                                                                                              //
  212.       //                                                                                                              //
  213.       //                                                 XML Extract                                                  //
  214.       //                                                                                                              //
  215.       //                                                                                                              //
  216.       //--------------------------------------------------------------------------------------------------------------//
  217.        Ctl-Opt dftActGrp(*No) actGrp(*Caller)
  218.                debug(*Yes) option(*SrcStmt:*NoDebugIO)
  219.                Main(cp1350r)                                                   ;
  220.       //--------------------------------------------------------------------------------------------------------------//
  221.       //                                                                                                              //
  222.       // ... files ...                                                                                                //
  223.       //                                                                                                              //
  224.       //--------------------------------------------------------------------------------------------------------------//
  225.        Dcl-F cp1350f Disk(250) Usage(*Output) UsrOpn                           ;
  226.       //--------------------------------------------------------------------------------------------------------------//
  227.       //                                                                                                              //
  228.       // ... data structures ...                                                                                      //
  229.       //                                                                                                              //
  230.       //--------------------------------------------------------------------------------------------------------------//
  231.        Dcl-DS handlerInfo                                                      ;
  232.          @stack             Char(60)        Dim(20)                            ;
  233.          @used              Ind             Dim(20)                            ;
  234.          stack$             Zoned(3:0)                                         ;
  235.        End-DS                                                                  ;
  236.       //--------------------------------------------------------------------------------------------------------------//
  237.       //                                                                                                              //
  238.       //                                                  Procedures                                                  //
  239.       //                                                                                                              //
  240.       //--------------------------------------------------------------------------------------------------------------//
  241.       //                                                   Mainline                                                   //
  242.       //--------------------------------------------------------------------------------------------------------------//
  243.        Dcl-Proc cp1350r                                                        ;
  244.          Dcl-PI *n                                          ExtPgm             ;
  245.            xmlDocument      Char(80)                                           ;
  246.          End-PI                                                                ;
  247.  
  248.        init()                                                                  ;
  249.        XML-SAX  %Handler(handler:handlerInfo)
  250.                 %XML(%Trim(xmlDocument) : 'ccsid=37 ' +
  251.                                           'doc=file ')                         ;
  252.        eoj()                                                                   ;
  253.  
  254.        Return                                                                  ;
  255.        End-Proc                                                                ;
  256.       //--------------------------------------------------------------------------------------------------------------//
  257.       //                                                   Handler                                                    //
  258.       //--------------------------------------------------------------------------------------------------------------//
  259.        Dcl-Proc handler                                                        ;
  260.          Dcl-PI *n          Int(10)                                            ;
  261.            tag              LikeDS(handlerInfo)                                ;
  262.            event            Int(10)         Value                              ;
  263.            dta@             Pointer         Value                              ;
  264.            dtaLen           Int(20)         Value                              ;
  265.            exceptionID      Int(10)         Value                              ;
  266.          End-PI                                                                ;
  267.  
  268.          Dcl-DS cp1350fds   Len(250)        End-DS                             ;
  269.          Dcl-S  dataItemVal Char(25)                                           ;
  270.          Dcl-S  dta         Char(1024)      Based(dta@)                        ;
  271.  
  272.          Select                                                                ;
  273.  
  274.          When ( event = *XML_START_DOCUMENT   )                                ;
  275.            tag.stack$ = 0                                                      ;
  276.  
  277.          When ( event = *XML_START_ELEMENT    )                                ;
  278.            tag.stack$ += 1                                                     ;
  279.            tag.@used = *Off                                                    ;
  280.            If ( tag.stack$ = 1 )                                               ;
  281.              tag.@stack(tag.stack$) = %SubSt(dta:1:dtaLen)                     ;
  282.            Else                                                                ;
  283.              tag.@stack(tag.stack$) =
  284.                %Trim(tag.@stack(tag.stack$-1)) + '/' + %SubSt(dta:1:dtaLen)    ;
  285.            EndIf                                                               ;
  286.  
  287.          When ( event = *XML_CHARS            )                                ;
  288.            dataItemVal = %Trim(%SubSt(dta:1:dtaLen):X'400D25')                 ; // magic
  289.            If ( dataItemVal <> *Blanks )                                       ;
  290.              cp1350fds = tag.@stack(tag.stack$) + dataItemVal                  ;
  291.              tag.@used(tag.stack$-1) = *On                                     ;
  292.              Write cp1350f cp1350fds                                           ;
  293.            EndIf                                                               ;
  294.  
  295.          When ( event = *XML_END_ELEMENT      )                                ;
  296.            If ( tag.@used(tag.stack$) )                                        ;
  297.              cp1350fds = tag.@stack(tag.stack$)                                ;
  298.              Write cp1350f cp1350fds                                           ;
  299.            EndIf                                                               ;
  300.            tag.stack$ -= 1                                                     ;
  301.          EndSl                                                                 ;
  302.  
  303.          Return 0                                                              ;
  304.        End-Proc                                                                ;
  305.       //--------------------------------------------------------------------------------------------------------------//
  306.        Dcl-Proc init                                                           ;
  307.        Open cp1350f                                                            ;
  308.        Return                                                                  ;
  309.        End-Proc                                                                ;
  310.       //--------------------------------------------------------------------------------------------------------------//
  311.        Dcl-Proc eoj                                                            ;
  312.        Close cp1350f                                                           ;
  313.        Return                                                                  ;
  314.        End-Proc                                                                ;
  315.       * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  316.       *                                                                   *
  317.       *  CD Catalog                                                       *
  318.       *                                                                   *
  319.       * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  320.      A          R CP1351
  321.      A            ARTIST        30A
  322.      A            TITLE         30A
  323.      A            COUNTRY       10A
  324.      A            COMPANY       10A
  325.      A            PRICE          5S 2
  326.      A          K ARTIST
  327.      A          K TITLE
  328.       //--------------------------------------------------------------------------------------------------------------//
  329.       //                                                                                                              //
  330.       //                                                                                                              //
  331.       //                                     Build Catalog From Intermediate File                                     //
  332.       //                                                                                                              //
  333.       //                                                                                                              //
  334.       //--------------------------------------------------------------------------------------------------------------//
  335.        Ctl-Opt dftActGrp(*No) actGrp(*Caller)
  336.                debug(*Yes) option(*SrcStmt:*NoDebugIO)
  337.                Main(cp1351r)                                                   ;
  338.       //--------------------------------------------------------------------------------------------------------------//
  339.       //                                                                                                              //
  340.       //... files ...                                                                                                 //
  341.       //                                                                                                              //
  342.       //--------------------------------------------------------------------------------------------------------------//
  343.        Dcl-F cp1350f        Disk(250)       InfDS(InfDB)    UsrOpn             ;
  344.        Dcl-F cp1351f                        Usage(*Output)  UsrOpn             ;
  345.        Dcl-F qsysprt        Printer(225)    Usage(*Output)  UsrOpn             ;
  346.       //--------------------------------------------------------------------------------------------------------------//
  347.       //                                                                                                              //
  348.       // ... global variables ...                                                                                     //
  349.       //                                                                                                              //
  350.       //--------------------------------------------------------------------------------------------------------------//
  351.        Dcl-DS cp1350fds     Len(250)                                           ;
  352.          dataItemHash       Char(60)        Pos(1)                             ;
  353.          dataItemVal        Char(100)       Pos(61)                            ;
  354.        End-DS                                                                  ;
  355.        Dcl-DS InfDB                                                            ;
  356.          DBrrn              Int(10)         Pos(397)                           ;
  357.        End-DS                                                                  ;
  358.        Dcl-S tags           Char(60)        Dim(8) CtData                      ;
  359.      oqsysprt   e                        1
  360.      o                       dataItemHash
  361.      o                       dataItemVal
  362.      o                       DBrrn         z
  363.       //--------------------------------------------------------------------------------------------------------------//
  364.       //                                                                                                              //
  365.       //                                                  Procedures                                                  //
  366.       //                                                                                                              //
  367.       //--------------------------------------------------------------------------------------------------------------//
  368.       //                                                   Mainline                                                   //
  369.       //--------------------------------------------------------------------------------------------------------------//
  370.        Dcl-Proc cp1351r                                                        ;
  371.  
  372.        Dcl-PR procProxy     ExtProc(procProxy@)             End-PR             ;
  373.        Dcl-S  szProcProxy   Zoned(3:0)      Inz(%Elem(@procProxy@))            ;
  374.        Dcl-S  procProxy@    Pointer(*Proc)                                     ;
  375.  
  376.        Dcl-DS *n                                                               ;
  377.          *n                 Pointer(*Proc)  Inz(%PAddr('CD'))                  ; // 1
  378.          *n                 Pointer(*Proc)  Inz(%PAddr('CD_TITLE'))            ; // 2
  379.          *n                 Pointer(*Proc)  Inz(%PAddr('CD_ARTIST'))           ; // 3
  380.          *n                 Pointer(*Proc)  Inz(%PAddr('CD_COUNTRY'))          ; // 4
  381.          *n                 Pointer(*Proc)  Inz(%PAddr('CD_COMPANY'))          ; // 5
  382.          *n                 Pointer(*Proc)  Inz(%PAddr('CD_PRICE'))            ; // 6
  383.          *n                 Pointer(*Proc)  Inz(*Null)                         ; // 7   CD_YEAR - ignored
  384.          *n                 Pointer(*Proc)  Inz(%PAddr('NOTFOUND'))            ; // 8   ... always last
  385.          @ProcProxy@        Pointer(*Proc)  Dim(8) Pos(1)                      ;
  386.        End-DS                                                                  ;
  387.  
  388.          init()                                                                ;
  389.          DoW ( reader() )                                                      ;
  390.  
  391.            tags(szProcProxy) = dataItemHash                                    ;
  392.            procProxy@ = @procProxy@(%LookUp(dataItemHash:tags))                ;
  393.            If ( procProxy@ <> *Null )                                          ;
  394.              procProxy()                                                       ;
  395.            EndIf                                                               ;
  396.  
  397.          EndDo                                                                 ;
  398.          eoj()                                                                 ;
  399.  
  400.          Return                                                                ;
  401.        End-Proc                                                                ;
  402.       //--------------------------------------------------------------------------------------------------------------//
  403.        Dcl-Proc cd                                                             ;
  404.          Write cp1351                                                          ;
  405.          Clear cp1351                                                          ;
  406.          Return                                                                ;
  407.        End-Proc                                                                ;
  408.       //--------------------------------------------------------------------------------------------------------------//
  409.        Dcl-Proc cd_title                                                       ;
  410.          TITLE = %Trim(DataItemVal)                                            ;
  411.          Return                                                                ;
  412.        End-Proc                                                                ;
  413.       //--------------------------------------------------------------------------------------------------------------//
  414.        Dcl-Proc cd_artist                                                      ;
  415.          ARTIST = %Trim(dataItemVal)                                           ;
  416.          Return                                                                ;
  417.        End-Proc                                                                ;
  418.       //--------------------------------------------------------------------------------------------------------------//
  419.        Dcl-Proc cd_country                                                     ;
  420.          COUNTRY = %Trim(dataItemVal)                                          ;
  421.          Return                                                                ;
  422.        End-Proc                                                                ;
  423.       //--------------------------------------------------------------------------------------------------------------//
  424.        Dcl-Proc cd_company                                                     ;
  425.          COMPANY = %Trim(dataItemVal)                                          ;
  426.          Return                                                                ;
  427.        End-Proc                                                                ;
  428.       //--------------------------------------------------------------------------------------------------------------//
  429.        Dcl-Proc cd_price                                                       ;
  430.          PRICE = %Dec(%Trim(dataItemVal):5:2)                                  ;
  431.          Return                                                                ;
  432.        End-Proc                                                                ;
  433.       //--------------------------------------------------------------------------------------------------------------//
  434.        Dcl-Proc notFound                                                       ;
  435.          Except                                                                ;
  436.          Return                                                                ;
  437.        End-Proc                                                                ;
  438.       //--------------------------------------------------------------------------------------------------------------//
  439.        Dcl-Proc reader                                                         ;
  440.          Dcl-PI *n          Ind                             End-PI             ;
  441.  
  442.          Read cp1350f cp1350fds                                                ;
  443.          Return ( Not %Eof(cp1350f) )                                          ;
  444.  
  445.        End-Proc                                                                ;
  446.       //--------------------------------------------------------------------------------------------------------------//
  447.        Dcl-Proc init                                                           ;
  448.  
  449.        Open cp1351f                                                            ;
  450.        Open cp1350f                                                            ;
  451.        Open qsysprt                                                            ;
  452.        Return                                                                  ;
  453.  
  454.        End-Proc                                                                ;
  455.       //--------------------------------------------------------------------------------------------------------------//
  456.        Dcl-Proc eoj                                                            ;
  457.  
  458.        Close cp1351f                                                           ;
  459.        Close cp1350f                                                           ;
  460.        Close qsysprt                                                           ;
  461.        Return                                                                  ;
  462.  
  463.        End-Proc                                                                ;
  464.       //--------------------------------------------------------------------------------------------------------------//
  465. ** 456789012345678901234567890123456789012345678901234567890
  466. CATALOG/CD                                                  001
  467. CATALOG/CD/TITLE                                            002
  468. CATALOG/CD/ARTIST                                           003
  469. CATALOG/CD/COUNTRY                                          004
  470. CATALOG/CD/COMPANY                                          005
  471. CATALOG/CD/PRICE                                            006
  472. CATALOG/CD/YEAR                                             007 not used - maps to *NULL
  473. tags not found                                              must be last
  474.  
© 2004-2019 by midrange.com generated in 0.011s valid xhtml & css