RTP Admin: Scenarios DLQ and Handler with IBM MQ

RTP

This article was developed with the intention to help IBM MQ Administrators gain a better understanding of Dead Letter Queues (DLQ) and DLQ Handlers for RTP administration. It provides basic scenarios and explanations within IBM MQ.

Becoming a Real Time Payments (RTP) “participant” has numerous challenges.  For many financial entities, this is their first exposure to IBM MQ as a messaging system, which is a requirement to join the network.

The Clearing House’s RTP represents the next evolution of payments innovation for instant transfer of funds and confirmation. Financial institutions, big and small, are offering Real Time Payments to their customers to stay competitive.

 

IBM MQ and RTP

At its fundamentals, the RTP network was built upon the concept of MQ request/reply.  For example, a participant sends a request message following the ISO 20022 XML standard.

MQ request/reply Example

The overall lifespan of the “transaction” is fifteen (15) seconds, but the “hops” between the participant to RTP to participant and back are two (2) seconds each between its respective queue manager.  The Clearing House’s RTP documentation provides a chart to explain further, and is outside the narrative of this article.

When everything works through “happy path”, it’ll be as shown above: the participant app sends a request message and awaits a response within the 15 second window.

But what happens if something goes wrong with the reply message being placed on RESPONSE.FROM.RTP1?

 

THE DEAD LETTER QUEUE AND THE DEAD LETTER HANDLER

THE DEAD LETTER QUEUE AND THE DEAD LETTER HANDLER

A feature of IBM MQ is the assured delivery of the messages to the destination. There can be number of scenarios where MQ may not be able to deliver messages to the destination queue, and all those are routed to the Dead Letter Queue (DLQ) defined for the queue manager.

Before placing message to the DLQ, IBM MQ attaches Dead Letter Header (DLH) to each message which contains all the required information to identify why the message was placed in DLQ and what was the intended destination. This DLH plays a very important role while handling the messages on DLQ.

THE DEAD LETTER QUEUE AND THE DEAD LETTER HANDLER 2

A feature of IBM MQ is the assured delivery of the messages to the destination. There can be number of scenarios where MQ may not be able to deliver messages to the destination queue, and all those are routed to the Dead Letter Queue (DLQ) defined for the queue manager.

Before placing message to the DLQ, IBM MQ attaches Dead Letter Header (DLH) to each message which contains all the required information to identify why the message was placed in DLQ and what was the intended destination. This DLH plays a very important role while handling the messages on DLQ.

THE DEAD LETTER QUEUE AND THE DEAD LETTER HANDLER 3

IBM has provided a Dead Letter Handler utility called “runmqdlq”.  Its purpose is to handle messages which are placed on the DLQ. With it, the routine takes necessary action on the messages placed on the DLQ.  A rule table can be created with required control data and rules and the same can be used as an input to the Dead Letter Handler.

 

TYPES OF DEAD LETTER MESSAGES

In a nutshell, messages placed on the DLQ are in two categories:
– Replayable Messages
– Non-RePlayable Messages

For Replayable Messages, these are placed on the DLQ due to some temporary issues like the destination queue is full or the destination queue is put disabled etc. These messages can be replayed without any further investigation. Also, re-playable messages can be performed by using dlqhandler.

For Non-Replayable Messages, these are placed on the DLQ due to issues which are not temporary in nature (incorrect destination, data conversion errors, mqrc_unknown_reason). These messages require further investigation to identify the cause on why those were placed on the DLQ. Replaying this category of messages with dlqhandler most likely will again arrive on DLQ. Therefore, they would require an MQ Admin to investigate further.

 

THE DLQ HANDLER SETUP

Replayable messages can be run again (reprocessed) by using the dlqhandler. In short, the dlqhandler is setup as follows:
1) Creating the dlqhandler rule file
2) Starting the dlqhandler

The rule file is a simple text-based flat file with configuration instructions as to how a message should be handled.

 

COMMON SCENARIOS

A number of scenarios are possible, based on the design of MQ infrastructure at an organization, but the most common scenarios are:

 

  • SINGLE-PURPOSE: Replayable messages are “replayed” to their original queue and non-replayable messages are placed on a single designated queue for an MQ admin to investigate

 

  • MULTI-PURPOSE: Replayable messages are “replayed” to their original queue and non-replayable messages are placed on a designated queue for each application for an MQ admin to investigate

 

  • HYBRID-PURPOSE: Replayable and non-replayable messages from specific queues are placed on a designated queue for each application for an MQ admin to investigate, while other replayable messages are “replayed” to their original queue, and non-replayable messages are placed on a single designated queue for an MQ admin to investigate.

 

It is worth noting that since messages being put to the DLQ because of their destination queue being full (MQRC_Q_FULL) or put inhibited (MQRC_PUT_INHIBITIED) will be in constant retry mode.  That said, it’s important to have a monitoring and alerting setup on a queue manager and associated mq objects to send an alert (email, page, etc.) for someone to investigate why a queue is approaching 100% depth or put inhibited.

 

SCENARIO 1: SINGLE PURPOSE

This is a very common requirement. This can be achieved by defining the rule file.

Contents of <scenario1.rul> file:

******=====Retry every 30 seconds, and WAIT for messages =======*****
RETRYINT(30) WAIT(YES)

***** For reason queue full and put disabled *******
**** retry continuously to put the message on the original queue ****
REASON(MQRC_Q_FULL) ACTION(RETRY) RETRY(999999999)
REASON(MQRC_PUT_INHIBITED) ACTION(RETRY) RETRY(999999999)

**** For all other dlq messages, move those to designated queue *****
ACTION(FWD) FWDQ(UNPLAYABLE.DLQ) HEADER(YES)
******=========================================================*******

 

SCENARIO 2: MULTI PURPOSE

Suppose there are 3 applications using the qmgr: APP1, APP2 and APP3. The following are the queues for these applications and dedicated queue to hold dlq messages for each application:
APP1 –
APP1.QL
DLQ – APP1.ERR.DLQ
APP2 –
APP2.QL
DLQ – APP2.ERR.DLQ
APP3 –
APP3.QL
DLQ – APP3.ERR.DLQ

We now have to write rules based on the DESTQ for each application. Below the example rule file for this scenario.
Contents of <scenario2.rul> file:

******=====Retry every 30 seconds, and WAIT for messages =======*****
RETRYINT(30) WAIT(YES)
***** For reason queue full and put disabled *******
**** retry continuously to put the message on the original queue ****
REASON(MQRC_Q_FULL) ACTION(RETRY) RETRY(999999999)
REASON(MQRC_PUT_INHIBITED) ACTION(RETRY) RETRY(999999999)
******* For APP1, forward messages to APP1.ERR.DLQ ******
DESTQ(APP1.QL) ACTION(FWD) FWDQ(APP1.ERR.DLQ) HEADER(YES)

******* For APP2, forward messages to APP2.ERR.DLQ ******
DESTQ(APP2.QL) ACTION(FWD) FWDQ(APP2.ERR.DLQ) HEADER(YES)

******* For APP3, forward messages to APP3.ERR.DLQ ******
DESTQ(APP3.QL) ACTION(FWD) FWDQ(APP3.ERR.DLQ) HEADER(YES)
**** For all other dlq messages, move those to designated queue *****
ACTION(FWD) FWDQ(GENEAL.ERR.DLQ) HEADER(YES)
*********=========================================================******

 

SCENARIO 3: HYBRID PURPOSE 

This is an example of regardless as to why the message was put on the DLQ for APP1.QL, the message needs to be forwarded to APP1.ERR.DLQ.

For other non-APP1 DLQ messages, attempt to replay them to their intended destination queue for queue full and queue inhibited messages.

If the retry interval is > 10 or it’s a non-replayable message, forward that message to the UNPLAYABLE.DLQ.

Contents of <scenario3.rul> file

******=====Retry every 30 seconds, and WAIT for messages =======*****
RETRYINT(30) WAIT(YES)

******* For APP1, forward messages to APP1.ERR.DLQ ******
DESTQ(APP1.QL) ACTION(FWD) FWDQ(APP1.ERR.DLQ) HEADER(YES)

***** For reason queue full and put disabled *******
**** retry 10 times to put the message on the original queue ****
REASON(MQRC_Q_FULL) ACTION(RETRY) RETRY(10)
REASON(MQRC_PUT_INHIBITED) ACTION(RETRY) RETRY(10)
**** For all other dlq messages, move those to designated queue *****
ACTION(FWD) FWDQ(UNPLAYABLE.DLQ) HEADER(YES)
******=========================================================*******

 

STARTING THE DLQHANDLER

Once the rule file is created, you can configure dlqhandler startup in the following ways:

1) Manually starting the dlqhandler and keep it running.
runmqdlq QMGR_DLQ QMGR_NAME < qrulefile.rul

2) Configure dlqhandler as a service in the queue manager
SPECIAL NOTE FOR WINDOWS SERVERS:
DEFINE SERVICE(dlqhandler) +
SERVTYPE(SERVER) +
CONTROL(MANUAL) +
STARTCMD(‘c:\var\bin\mqdlq.bat’) +
DESCR(‘dlqhandler service’) +
STARTARG(‘DLQ QMNAME C:\var\rulefiles\qrulefile.rul’) +
STOPCMD(‘c:\var\bin\stopdlh.bat’) +
STOPARG(‘DLQ QMNAME’) +
STDOUT(‘/path/dlq.log’) +
STDERR(‘/path/dlq_error.log’)

NOTE: For Windows, because of the nature of how it doesn’t handle redirects “<” like Linux does, a wrapper script must be written.

Contents of mqdlq.bat (start command):
echo alt ql(%1) get(enabled) | runmqsc %2
runmqdlq.exe %1 %2 %3

Contents of stopldh.bat (stop command):
echo alt ql(%1) get(disabled) | runmqsc %2

Enabling and disabling the DLQ essentially kills the dead letter handler.

3) Configure triggering at DLQ to start dlq handler whenever first message arrives on the queue.

You can do it by simply following the steps on how to configure triggering on any queue. Please refer the following link for more information on triggering.

Once you setup triggering, dlqhandler will be started by triggering based on the condition that you set. You don’t need to have dlqhandler running all the time as it can be started again by triggering. Due to above reason, you don’t need WAIT(YES) in the rule file, you can change it to WAIT(NO).

 

WANT TO LEARN MORE?

TxMQ delivers an RTP Starter Pack to accelerate participant onboarding.  You can learn more about our starter pack here: http://txmq.com/RTP/

Our deep industry experience and subject matter experts are available to solve your most complex challenges.  We deliver solutions and innovations to do business in an ever-changing world, to guide & support your organization through the digital transformation journey.

TxMQ
Imagine. Transform. Engage.
We’re here to help you work smart in the new economy.

This post was authored by John Carr, Principle Integration Architect at TxMQ. You can follow John on LinkedIn here.

Managed Services: Regain focus, improve performance and move forward.

Managed IT Services Help You Improve Your Team’s Performance.

There is no such thing as multitasking. It’s been scientifically proven that if you spread your focus too thin, something is going to suffer. Just like a computer system, if you are trying to work on too many tasks at once, it’s going to slow down, create errors and just not perform as expected.

You must regain your focus in order to improve performance and move forward.

The fact is, no matter what your industry or business is today, your success is dependent upon your IT team staying current, competent, and competitive within the industry. Right now there is more on your IT departments “plate” than ever before. Think about how much brain power and talent you’re misusing knowing that your best IT talent is spending the bulk of their efforts just managing the day to day. Keep in mind that most of these issues can be easily fixed, and even avoided with proper preparation.

How do you continuously put out fires, keep systems running smoothly, and still have time to plan for the future?

As the legendary Ron Swanson once said “Never half-ass two things. Whole-ass one thing.”
Ron Swanson Yep

Don’t go at it alone when you can supplement your team and resources.

Overworked employees have higher turnover (-$), make more mistakes (-$), and work slower (-$)(-$). This is costing you more than you can ever fully account for, though your CFO may try. Think about those IT stars that are so important to the success of your business. You may just lose them, starting another fire to put out when you’re trying to gain, train, and grow.

Managed IT Services Can Put You Back on Track!

No one knows your business better than you, and I’m just guessing but, I bet you’ve gotten pretty good at it by now. However, as any good leader or manager knows, without your focus on the future you could lose out as your industry changes, and if you didn’t notice it’s already changing.

To remain competitive, you need an advantage that can help you refocus your team and let you do you, because that’s what you do best.

At TxMQ we are not an expert in your business, and we would never claim to be one. Our Managed Services mission is to take care of the stuff you don’t have the resources, expertise, or the time for, and then we make it run at it’s best. You can refocus your full attention to improving your business.
Whether your producing widgets to change the world, a life saving drug, or providing healthy food for the masses, you don’t have to spread yourself thin. We Monitor, Manage and Maintain, Middleware Systems & Databases that power your business. As a provider we are technology and systems agnostic.
What we do is nothing you can’t do yourself or maybe, already are doing. If resources are scarce, putting extra work on your existing team can cost you more than it needs to. TxMQ’s Managed Services teams fill in the gaps within your existing IT resources to strengthen and solidify your systems, so that you can focus on everything else.

TxMQ’s Managed Services team helps you refocus, so can concentrate on growth and tackling your industry and business challenges.

If you’re interested in getting more sleep at night and learning more about our Managed Services please reach out or click below for more info.

Learn About Managed Services and Support With TxMQ Click Here!

IBM WebSphere Application Server (WAS) v.7 & v.8, and WebSphere MQ v.7.5 End of Support: April 30, 2018

Are you presently running on WAS versions 7 or 8?
   Are you leveraging WebSphere MQ version 7.5?

Time is running out, IBM WebSphere Application Server (WAS) v.7 & v.8, and WebSphere MQ v.7.5 support ends in less than 6 months. As of April 30th 2018, IBM will discontinue support on all WebSphere Application Server versions 7.0.x & v8.0.x; and WebSphere MQ v7.5.x.

It’s recommended that you migrate to WebSphere Application Server v.9 to avoid potential security issues that may occur on the early, unsupported versions of WAS (and Java).
It’s also recommended that you upgrade to IBM MQ version 9.0.x, to leverage new features, and avoid costly premium support fees from IBM.

Why should you go through an upgrade?

Many security risks can percolate when running back-level software, especially WAS running on older Java versions. If you’re currently running on software versions that are out of support, finding the right support team to put out your unexpected fires can be tricky and might just blow the budget.
Upgrading WAS & MQ to supported versions will allow you to tap into new and expanding capabilities, and updated performance enhancements while also protecting yourself from unnecessary, completely avoidable security risks and added support costs.

WebSphere Application Server v.9 Highlights

WebSphere Application Server v.9.0 offers unparalleled functionality to deliver modern applications and services quickly, securely and efficiently.

When you upgrade to v.9.0, you’ll enjoy several upgrade perks including:
  • Java EE 7 compliant architecture.
  • DevOps workflows.
  • Easy connection between your on-prem apps and IBM Bluemix services (including IBM Watson).
  • Container technology that enables greater development and deployment agility.
  • Deployment on Pivotal Cloud Foundry, Azure, Openshift, Amazon Web Services and Bluemix.
  • Ability to provision workloads to IBM cloud (for VMware customers).
  • Enhancements to WebSphere extreme scale that have improved response times and time-to-configuration.

 

IBM MQ v.9.0.4 Highlights

With the latest update moving to MQ V9.0.4, there are even more substantial updates of useful features for IBM MQ, even beyond what came with versions 8 (z/OS) & 8.5.

When you upgrade to v.9.0.4, enhancements include:
  • Additional commands supported as part of the REST API for admin.
  • Availability of a ‘catch-all’ for MQSC commands as part of the REST API for admin.
  • Ability to use a single MQ V9.0.4 Queue Manager as a single point gateway for REST API based admin of other MQ environments including older MQ versions such as MQ V9 LTS and MQ V8.
  • Ability to use MQ V9.0.4 as a proxy for IBM Cloud Product Insights reporting across older deployed versions of MQ.
  • Availability of an enhanced MQ bridge for Salesforce.
  • Initial availability of a new programmatic REST API for messaging applications.

This upgrade cycle also offers you the opportunity to evaluate the MQ Appliance. Talk to TxMQ to see if the MQ Appliance is a good option for your messaging environment.

What's your WebSphere Migration Plan? Let's talk about it!

Why work with an IBM Business Partner to upgrade your IBM Software?

You can choose to work with IBM directly – we can’t (and won’t) stop you – but your budget just might. Working with a premier IBM business partner allows you to accomplish the same task with the same quality, but at a fraction of the price IBM will charge you, with more personal attention and much speedier response times.
Also, IBM business partners are typically niche players, uniquely qualified to assist in your company’s migration planning and execution. They’ll offer you and your company much more customized and consistent attention. Plus, you’ll probably be working with ex-IBMers anyway, who’ve turned in their blue nametags to find greater opportunities working within the business partner network.

There are plenty of things to consider when migrating your software from outdated versions to more current versions.

TxMQ is a premier IBM business partner that works with customers to oversee and manage software migration and upgrade planning. TxMQ subject matter experts are uniquely positioned with relevant experience, allowing them to help a wide range of customers determine the best solution for their migration needs.
Get in touch with us today to discuss your migration and back-level support options. It’s never too late to begin planning and executing your version upgrades.

To check on your IBM Software lifecycle, simply search your product name and version on this IBM page or, give TxMQ a call…

Oracle Announces End of Support For JD Edwards EnterpriseONE Technology Foundation

Long title. What’s this about?
In short, back in 2010, Oracle announced the withdrawal of JDE EnterpriseOne Technology Foundation. The final nail in this coffin comes on September 30, 2016, when technical support officially ends.
What this means is that for many customers (and I’ll get into particulars shortly) there’s a requirement to make a critical decision to either move to Oracle’s Red Stack, or procure new IBM licenses in order to remain on IBM’s Blue Stack.
There’s a variety of customers running the stack, and nearly as wide a variety of options for how companies may have deployed their JDE solution. WebSphere with DB2 is the original and most common. WebSphere with Oracle as the backend is another common combo. And there’s a variety of other blends of supported web/application servers, database servers and related middleware.
Regardless of the configuration, in most cases, these products were part of the bundled solution that customers licensed from Oracle, and now a decision point’s been reached.
This doesn’t mean Oracle’s dropping support for IBM products. This does mean there’s a change in the way they’re licensed.
So what is “Technology Foundation”?
To quote from Oracle’s documents verbatim: Technology Foundation is an Oracle product that provides license for the following components:

  • JD Edwards EnterpriseOne Core Tools and Infrastructure, the native design time and runtime tools required to run JD Edwards EnterpriseOne application modules
  • IBM DB2 for Linux, Unix, and Windows, limited to use with JD Edwards EnterpriseOne programs
  • IBM WebSphere Application Server, limited to use with JD Edwards EnterpriseOne programs
  • IBM WebSphere Portal, as contained in JD Edwards EnterpriseOne Collaborative Portal

Technology Foundation is also referred to by the nickname “Blue Stack.”
If your license for JD Edwards EnterpriseOne applications includes an item called “Technology Foundation” or “Technology Foundation Upgrade,” this affects you.
If there are any other terms like “Oracle Technology Foundation,” then this change does NOT affect you. This is also different then the foundation for JD Edwards World.
So what now? In short, if you have Blue Stack, you should contact TxMQ or IBM immediately to acquire your own licensed products to continue to run your Oracle solution. TxMQ can offer aggressive discounts to Oracle customers subject to certain terms and conditions. Contact us for pricing details. We can help with pricing, as well as with any needed migration planning and implementation support.
Contact chuck@txmq.com for immediate price quotes and migration planning today.
Image from Håkan Dahlström.

How Do I Support My Back-Level IBM Software (And Not Blow The Budget)?

So you’re running outdated, obsolete, out-of-support versions of some of your core systems.? WebSphere MQ maybe? or WebSphere Process Server or Datapower…the list is endless.
Staff turnover may be your pain point – a lack of in-house skills – or maybe it’s lack of budget to upgrade to newer, in-support systems. A lost of times it’s just a matter of application dependencies, where you can’t get something to work in QA, and you’re not ready to migrate to current versions just yet.
The problem is that management requires you to be under support. So you get a quote from IBM to support your older software, and the price tag is astronomical – not even in the same solar system as your budget.
The good news is you do have options.

We were able to offer a 6-month support package, which eventually ran 9 months in total. Total cost was under $1,000 a month.

Here at TxMQ, we have a mature and extensive migration practice, but we also offer 24×7 support (available as either 100% onshore, or part onshore, part offshore) for back-level IBM software and product support – all at a fraction of IBM rates.
Our support starts at under $1,000 a month and scales with your size and needs.
TxMQ has been supporting IBM customers for over 35 years. We have teams of architects, programmers, engineers and others across the US and Canada supporting a variety of enterprise needs.
Case Studies
A medical-equipment manufacturer planned to migrate from unsupported versions of MQ and Message Broker. The migration would run 6 to 9 months past end-of-support, but the quote from IBM for premium support was well beyond budget.
The manufacturer reached out to TxMQ and we were able to offer a 6-month support package, which eventually ran 9 months in total. Total cost was under $1,000 a month.
Another customer (a large health-insurance payer) faced a similar situation. This customer was running WebSphere Process Server, Ilog, Process Manager, WAS, MQ, WSRR, Tivoli Monitoring, and outdated DataPower appliances. TxMQ built an executed a comprehensive “safety net” plan to support this customer’s entire stack during a very extensive migration period.
It’s never a good idea to run unsupported software – especially in these times of highly visible compliance and regulatory requirements.
In addition to specific middleware and application support, TxMQ can also work to build a compliance framework to ensure you’re operating within IBM’s license restrictions and requirements – especially if you’re running highly virtualized environments.
Get in touch today!
(Image from Torkild Retvedt)

Case Study: Middleware Health Check

Project Description

An online invoicing and payment management company (client) requested a WebSphere Systems Health Check to ensure their systems were running optimally and prepared for an expected increase in volume. The project scope called for onsite current state data collection and analysis followed by offsite detailed data analysis and White Paper preparation and delivery. Next, our consultant performed the recommended changed (WebSphere patches and version upgrades).

Click here to learn more about our Middleware Health Check.

The Situation

The client described their systems as “running well” but they were concerned they may have problems as they experienced additional load (700% estimated); memory consumption was a particular concern. TxMQ completed the Health Check and worked with representatives from the client for access to production Linux boxes, web servers, application servers, portal servers, database servers and LDAP servers. Additional client representatives would be available for application-specific questions.

The Response

Monitoring of the components had to be completed during a normal production day. The web servers, application servers, database server, directory server and Tomcat server all needed to be monitored for several hours. The normal production day typically showed a low volume of transactions so the when the monitoring statistics began they were all very normal; resource usage on the boxes was very low. Log files were extracted from the web servers, directory server, database server, deployment manager, application servers and Tomcat (batch) server. Verbose garbage collection was enabled for one of the application servers for analysis and a Javacore and Heap Dump was generated on an application server to analyze threads and find potential memory leaks.

Monitoring and analysis tool options were discussed with the client. TxMQ recommended additional IBM tools and gave a tutorial on the WebSphere Performance Viewer (built into the WebSphere Admin Console). In addition, TxMQ’s consultant sent members of the client’s development team links to download IBM’s Heap Analyzer and Log Analyzer (very useful for analyzing WAS System Logs and Heap Dumps). TxMQ’s consultants met with the client’s development and QA staff to debrief them on the data gathered.

The Results

Overall, the architecture was sound and running well but the WebSphere software had not been patched for several years and code-related errors filled the system logs. There were many potential memory leaks which could have caused serious response and stability problems as the application scales for more users.

The QA team ran stress tests which indicated that the response times would get worse very quickly as more users were added. Further, the version of software and web server plugin was 6.1.0 and vulnerable to many security risks.

The HTTP access and error logs had no unusual or excessive entries. The http_plugin logs were very large and were rotated – making it faster and easier to access the most recent activity.

One of the web servers was using much more memory than the other although it should have been configured exactly the same. The production application servers were monitored over a three-day period and didn’t exhibit any outward signs of stress; the CPU was very low, memory was not maxed out, and the threads & pools were minimally used. There were a few configuration errors and warnings to research but the Admin Console settings were all well within norms.

Items of concern:

1) A large number of application code related errors in the logs; and
2) The memory consumption grows dramatically during the day.

These conditions can be caused by unapplied software patches and code-related issues. In a 24-hour period, Portal node 3 experienced 66 errors and 227 warnings in the SystemOut log and 1396 errors in the SystemErr log. These errors take system resources to process, will cause unpredictable application behavior, and can cause hung threads and memory leaks. The production database server was not stressed – it has plenty of available CPU, memory and disk space. The DB2 diagnostic log had recorded around 4536 errors and 17,854 warnings in the previous few months. The Tivoli Directory server was not stressed – plenty of available CPU, memory and disk space. The SystemOut log recorded 107 errors and 8 warnings in the previous year -­ many of these could be fixed by applying the latest Tivoli Directory Server patch (6.1.0.53). The Batch Job (Tomcat) server was not stressed – plenty of available CPU, memory and disk space. The catalina.out log file is 64Mb and contained many errors and warnings.

The HealthCheck written analysis was delivered to the client with recommended patches and application classes to investigate for errors and memory leaks. In addition, a long-­term plan was outlined to upgrade to a newer version of WebSphere Application Server and migrate off WebSphere Portal Server (since its features were not needed).

Photo courtesy of Flickr contributor Tristan Schmurr

General Best Practices for WebSphere Application Environments

I found a great article written by Asim Saddal outlining  a list of general best practices to apply to any WebSphere Application Server V7 and V8 environment. It is copied exactly below. If you need the original article source, you can find it here.

However, some of the recommendations only apply to specific conditions and scenarios. These recommendations could be used to set up any WebSphere environment.

General Best Practices for WebSphere Application Environments

1. All WebSphere Application processes should be running as non-admin/root user.
It’s not a good practice to run a process as an admin/root user. For obvious reasons, you don’t want more folks to know about the admin/root password and generally the WebSphere admins are not the system admins. Create a services user account on the box and use it for the WebSphere Application’s start and stop purposes.

2. Enabled Global Security.
By default, the WebSphere Application Server enables administrative security. Thus, for the most part, the infrastructure provides for reasonable authentication, authorization, and encryption of administrative traffic by default. When administrative security is enabled, the WebSphere Application Server’s internal links between the deployment manager and the application servers and traffic from the administrative clients (Web and command line) to the deployment manager are encrypted and authenticated. Among other things, this means that administrators will be required to authenticate when running the administrative tools.

3. Enabled Application Security.
In addition to leveraging the application server’s security for administration, it’s strongly recommend that you leverage it for application security. Doing so gives your applications access to a strong and robust standards-based security infrastructure. Applications that didn’t leverage application server security were typically found to have serious security holes. Designing and implementing a secure distributed infrastructure is not easy.

To enable application security, go to the global security panel and select Enable application security.

4. Configure WebSphere Security with proper LDAP repository.
WebSphere security supports different configurations, including LDAP servers, local users and local operating system levels users. However, it’s recommended that you use a proper LDAP server for this purpose.

5. Leverage Administrative roles.
WebSphere Application Server allows for a variety of administrative roles depending on the version: Administrator, Operator, Monitor, Configurator, AdminSecurityManager, iscadmins, Deployer, or Auditor. These roles make it possible to give individuals (and automated systems) access that’s appropriate to their level of need. It’s strongly recommended that you take advantage of roles whenever possible.

By using the less powerful roles of monitor and operator, you can restrict the actions an administrator can take. For example, you can give the less senior administrators just the ability to start and stop servers and the night operators just the ability the watch the system (monitor). These actions greatly limit the risk of damage by trusting people with only the permissions they need.

6. Use HTTP Server as an interface for the Applications.
Use HTTP servers in front of an application layer, i.e., WebSphere Application. Don’t allow communications directly with WebSphere’s http web container port from either a load balance or from browsers.

7. HTTP and WebSphere on the Same box.
At least in higher environments, install and configure the http server on a different box than the WebSphere box. In the lower environments the same box can be used for both layers.

8. Logs on External Drive.
At least in higher environments, write the WebSphere and application log files to an external drive, so it won’t fill up the server’s file space.

9. Logs Archive.
Depending on the application, rotate and clean up the logs in a timely manner.
10. Read-only Logs Access for Developer.

If it’s okay with the security team, grant developers read-only access for WebSphere and the applications logs on the external drive.

11. Alternate Log Access for Developers.
To enable developers to view the production application and WebSphere logs, host those shared folders from the web server instead of giving them access to those boxes. Once the logs are hosted from the web server, developers need only a web browser to view those files from their computers.

12. Log Level.
Configure log level to error in high environments. Logs in the higher environments don’t need to produce unnecessary information. In the lower environments it can be set to info or debug level.

13. Leverage WebSphere Application Servers’ high availability and failover capabilities.
Out-of-the-box WebSphere support high availability and failover functionality. There is no need to use any external component or product for this. One of the key benefits is that  user http session can be shared within the cluster members and, in the case of failover, the other active cluster members can resume the activity using same session.

14. Minimum Cluster Members in Cluster.
In the WebSphere clustered environment, define and create at least three cluster members. In the case of failover with two cluster members, not only the entire load will shift to one node but it also becomes a single point of failure. With three nodes, at least the load will still be distributed to two nodes and there is no single point of failure.

15. Database and WebSphere on Same box.
At least in higher environments, separate the application layer from the data layer and install them on different boxes. In the lower environments the same box can be used for both layers.

16. Use Type-4 JDBC Drivers.
Type-4 JDBC drivers don’t require any component to be installed on the application layer. For the type-2 and type-3, the database’s client needs to install on the WebSphere box.

17. Protect application server to database link.
As with any other network link, confidential information can be written to or read from the database. Most databases support some form of network encryption and you should leverage it.

18. Script based WebSphere Administration.
In general, it’s good practice to use scripts to avoid human errors during the deployment and configuration, especially in higher environments. However, it requires an investment in time and resources to develop these scripts, especially if it is first time and / or script-based administration is not part of the current culture. Once the scripts are created, they can be used in all environments and maybe automate some of the tasks.

19. Monitoring.
Use proper application and infrastructure runtime monitoring tools that can monitor environments and application thresholds and potentially alert you to problems before they cause service interruptions.

20. EAR vs WAR Files.
According to J2EE specs, EAR file should be deployed in WebSphere. However, WebSphere does support deploying WARs and upgrade class functionality. Developers should produce EAR files from their development tool or generate EAR should it be created from the deployment scripts before deploying the application in WebSphere.

21. Don’t run samples in production.
WebSphere Application Server ships with several excellent examples to demonstrate various parts of the WebSphere Application Server. These samples are not intended for use in a production environment. Don’t run them there, as they create significant security risks. In particular, the showCfg and snoop servlets can provide an outsider with tremendous amounts of information about your system. This is precisely the type of information you don’t want to give a potential intruder. This is easily addressed by not installing the samples during the profile creation.

22. Environments.
Generally, it’s good to have more environments. Typically four would be a sufficient enough: development, QA, staging and production. Development and QA environments don’t need a lot of hardware resources. It’s recommended that the staging environment be a replica of production in terms of application data and hardware resources. The staging environment can also be used for stress testing and / or for production support.

23. Performance Tuning.
Tune WebSphere application servers properly for each application. Performance tuning includes optimization of a number of web container threads, JVM heap sizes, JDBC connections, OS tuning, etc. After configuring these parameters to optimize values, boost the application performance. Stress / staging environment should be used for load testing.

24. Separate your production network from your intranet.
Most organizations today understand the value of a DMZ that separates the outsiders on the Internet from the intranet. However, far too many organizations fail to realize that many intruders are on the inside. You need to protect against internal as well as external threats. Just as you protect yourself against the large untrusted Internet, you should also protect your production systems from the large and untrustworthy intranet.

25. Separate your production networks from your internal network using firewalls.
These firewalls, while likely more permissive than the Internet-facing firewalls, can still block numerous forms of attack.

Keep up-to-date with patches and fixes. As with any complex product, IBM occasionally finds and fixes security bugs in WebSphere Application Server, Virtual Enterprise, Datapower and other products. It’s crucial that you keep up-to-date on these fixes. It’s advisable that you subscribe to support bulletins for the products you use and, in the case of WebSphere Application Server and WebSphere Virtual Enterprise, monitor the security bulletin site for your version. Those bulletins often contain notices of recently discovered security bugs and the fixes. You can be certain that potential intruders learn of those security holes quickly. The sooner you act the better.

More information on WebSphere Application Server security, including recommendations on hardening the WebSphere Application Server infrastructure, is available on the WebSphere Application Server security page.

© 2008 SYS-CON Media Inc.