Dynamic Message Parsing in App Connect Enterprise

Foreword

The Problem

Imagine that you have an integration in which you receive booking messages (might be as files from an FTP folder, or via an MQ queue…) and perform an operation that requires access to its content (like a mapping). Also imagine that different partners will send you the messages in different formats, like plain text and XML, because they are familiar with them.

First, you need to parse your message from bitstream to the domain in which your message is defined and there are multiple ways to do this in ACE.

One way is to use the Input Message Parsing tab in your input node to specify the domain, message model, message, and physical format (which will for the most part mean something different depending on the chosen domain). This will inform the input node to use the specified domain as parser according to the configured options, resulting in your message being parsed and placed in your message tree accordingly. Note that this will of course fail if the options set are not matching the actual definition of your input messages.

Another option is to use the Reset Content Descriptor node. Here again, you may set the domain, message model, message, and physical format in the basic tab and check the reset checkboxes. Again, this will result in your node input message being parsed and placed in the message tree according to the configuration. Note that this node doesn’t convert your message, for instance, it won’t magically convert a JSON message into an XML message, it will just try to parse the input according to the setup, and it will fail if your message doesn’t match it.

These are the most common ways of parsing messages in ACE flows, and both have the same downside, which is that you need to know upfront in design time which message type you are receiving in your node, which is not our case.

In the case of the flow input node, you can’t dynamically set via code any overrides for the node configuration (if they would exist), since it is the first node to be executed. So you are stuck with what you set in design time.

In the case of the reset content descriptor, there exists no overrides for the node properties at all, so even though this node won’t be the first in your flow, you have no means to dynamically change its behavior in runtime.

You may either decide to use multiple flows, each having an input node with a different parsing setup, or having a reset content descriptor node with a different configuration. You may also have one single flow in which you add an input node for each different format (as long as you can tell apart the source of each format, like having a separate folder or queue for each input format). And this is a perfectly fine approach, but may become cumbersome if the number of formats or message versions for each format starts to increase.

Another option is to add a configuration telling your flow which message type you must use for each message received, (matching the configuration to some key in the MQ headers, or filename), and using dynamic parsing in runtime. The flow would then first load the configuration specification (for instance from a database that can even be cached) match the specific configuration to your execution, and then use dynamic parsing.

CREATE Statement with Parse Clause

ESQL offers one way to do dynamic parsing via the CREATE statement when using the PARSE clause. The create statement creates a new message field in your message tree and provides options to modify its behavior.

The parse clause allows you to create the target field as the result of parsing a bitstream element under the configuration that you may dynamically specify via parameters. Those parameters include the encoding and CCSID of your message, set, type, and format, which you may previously determine from your configuration.

Note that these parameters don’t include your domain, but you may create your target field under such domain using the DOMAIN clause according to the domain obtained from your configuration. A full statement would look something like this:

CREATE LASTCHILD OF <targetRoot> DOMAIN(configuredDomain) PARSE 
(<messageAsBitStream> ENCODING <configuredEncoding> CCSID 
<configuredCcsid> SET <configuredSet> TYPE <configuredType> FORMAT 
<configuredFormat> OPTIONS <options>);Code language: HTML, XML (xml)

The options that appear in the last parameter include settings like if validation needs to be applied or not, and in that case what type of validation and how to handle errors. Those options are applied via existing constants in ACE and processed via a BITOR function to combine multiple options. See full details in the previously linked documentation.

Thanks to this statement you may configure your flow by performing the following steps:

  1. Get your message via your input node as bitstream (BLOB).
  2. Load the configuration (skip if you have it cached).
  3. Determine the configuration to be used for your input message (for instance based on a key in the input filename or MQ headers…).
  4. In an ESQL node, parse your input message making use of the aforementioned create with parse statement and place it in the output.
  5. Proceed with the rest of the message handling of the flow.

Domain, Set, Type, and Format Parameters

We need to elaborate a bit further on the parameters required for the CREATE statement.

First of all, you may have noticed a difference in naming in comparison to the parsing options for the input nodes or the basic options for the reset content descriptor node (which use message domain, message model, message, and physical format). The ones used in the create statement are reminiscent of the MRM models used mainly in older versions of the product. Nonetheless, they refer to the same things, and we can say that SET = Message Model, TYPE = Message and FORMAT = Physical Format. Thankfully DOMAIN remains the same everywhere.

But what do these terms refer to actually? Let’s go one by one:

  • DOMAIN. The domain is the main type of your message and can be any of the supported by ACE like XMLNSC, JSON, DFDL, BLOB, MRM, or MIME. Each domain is associated with a particular parser. Some of them require a message model in order to be able to parse/serialize the messages (like MRM or DFDL). Others like XMLNSC can parse and serialize messages without any message models (although as we’ll see later, having a model, in this case, allows you to do more thorough validations and parsing).
  • SET (or message model). Here the meaning starts to differ depending on the domain.
    • For DFDL and XMLNSC this represents the shared library in which the model is defined, if it is located in one, or would be empty otherwise. The library name must be written between curly braces as in {MysharedLibraryName}.
    • For MRM models this represents the Message Set name of the MRM model with no reference to the library containing it if that is the case.
  • TYPE (or message). It represents the message of an MRM or DFDL model. In the case of a DFDL model it will also contain the message namespace between curly braces as in {myNamespace}:myMessageModelName. For most other domains it has no use and shouldn’t be provided (pass a null parameter).
  • FORMAT (or physical format). This represents the message physical properties of an MRM model as defined in the message set. It can have values like XML or EDIFACT. For most other domains it has no use and shouldn’t be provided (pass a null parameter).

Since it varies from domain to domain, you should always check the documentation of the associated parser to identify what the meaning is of each parameter for each case.

A tip to make the search a bit easier is to use the parsing options of an input node, or basic options of a reset content descriptor. Set your domain and have at hand a message definition of such a domain. When you select your domain, some options will be automatically enabled, and the rest will be disabled (greyed out) meaning that they cannot be configured. Then for each enabled option, you may use the browse button in order to search for a suitable element. The ACE toolkit will filter and show only the elements that you may choose, be it a shared library, a message definition from a DFDL model, or the physical format of an MRM message.

Note on Parsing XML Messages

The approach mentioned in the previous section works with XMLs with one caveat, all fields will be loaded in the message tree as a character type. The only way to parse an XML message with its correct types is if you provide an XSD model, otherwise the parser is able to parse the message but doesn’t know which types to assign to each element, so it sets them all as characters.

But there is more. Besides having an XSD, you need to apply message validation to the message and the option Build tree using XML schema data types. And all these options are available in the Reset Content Descriptor node.

In the Parser Options tab select the domain and message model, which in this case refers to the shared library in which the XSD is located. If the XSD is located in the same application of your flow, you will leave this empty.

Establish message validation in the Validation tab which will enable the Build tree using XML schema data types option in the Parser Options tab (otherwise it remains greyed out and can’t be set).

However, as indicated earlier, the reset content descriptor node configuration can’t be overridden in runtime, so this won’t be enough for our purpose.

To achieve this result, we need to apply an undocumented option in the options clause of the previously described CREATE statement. The option is XMLNSC.BuildTreeUsingSchemaTypes.

First, we set the options to be used in the create statement, including validation options:

DECLARE options = BITOR(ValidateContentAndValue, 
XMLNSC.BuildTreeUsingSchemaTypes);Code language: PHP (php)

Then execute the CREATE statement as indicated previously:

CREATE LASTCHILD OF <targetRoot> DOMAIN(configuredDomain) PARSE 
(<messageAsBitStream> ENCODING <configuredEncoding> CCSID 
<configuredCcsid> SET <configuredSet> TYPE <configuredType> FORMAT 
<configuredFormat> OPTIONS options);Code language: HTML, XML (xml)

Overriding Parsing Options of MQ Input Node

A bit earlier in this document, it was stated that you can’t dynamically set any overrides for the configuration of an input node, since it is the first node to be executed. That is however not entirely accurate since there is one case in which it is possible. The MQ input node

It is true that the parsing options can’t be overridden in the same flow, but what if they could be overridden by the message content itself? This way a sending application could set its parsing configuration in the message, and the flow would parse the message according to it in runtime.

This is exactly what is possible with MQ messages, by making use of the mcd folder fields. The mcd folder is a subfolder of the MQRFH2 header and contains the following fields, which are a direct match to those used in the PARSE clause of the CREATE statement in ESQL:

  • Msd. This indicates the domain of your message. This is the same as the DOMAIN option in the PARSE clause.
  • Set. Your message SET (or message model), matching one to one the meaning of the SET option in the PARSE clause, with a different use depending on the domain.
  • Type. Matches the TYPE option of the PARSE clause, mainly relevant for DFDL and MRM domains.
  • Fmt. The message format (or physical format). Matches the FORMAT option of the PARSE clause (only relevant for the MRM domain).

The MQ input nodes will use by default the parsing configuration provided by the node itself unless the mcd folder is informed and the <Msd> element contains a valid domain, in which case the configuration in the mcd folder takes precedence. This way it is possible to effectively override the node configuration and change its parsing behavior in runtime.

Closing

In this blog, we have introduced the concept of configuration-driven dynamic processing in the scope of a custom integration framework and how we can learn lessons from it that can be applied in other contexts.

Next, we have focused on the concept of dynamic message parsing and shown the most commonly used parsing options in ACE and how they don’t fully achieve the goal.

We have continued to detail the ESQL CREATE statement, which can be used in combination with the PARSE clause in order to achieve real dynamic message parsing in ACE.

Next, we have reviewed the different parameters of the PARSE clause of the CREATE statement and provided information about their meaning depending on the message domain. We then made a stop to make clear some limitations and requirements when using the CREATE statement in combination with the XML message domain and finally had a look at how to override the parsing options of an MQ input node via the setup of the message headers.

 

https://www.linkedin.com/in/eduardo-cabezas-l%C3%B3pez-68aa5315/

IBM Integration Specialists

Enabling Digital Transformations.

Recent news
Let's get in touch...

info@integrationdesigners.com

​ Find us here

Veldkant 33B
2550 Kontich
Belgium

Pedro de Medinalaan 81
1086XP Amsterdam
The Netherlands

© 2019 Integration Designers - Privacy policy - Part of Cronos Group integr8 consulting