Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!

Architecture

  • submit to reddit
Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

Scalability & High Availability

By Eugene Ciurana

20,871 Downloads · Refcard 43 of 151 (see them all)

Download
FREE PDF


The Essential Scalability & High Availability Cheat Sheet

Scalability and Availability are mentioned so often that often it is difficult to know what they actually mean in each case. They are often interchanged and create confusion that results in poorly managed expectations and unrealistic metrics. This DZone Refcard provides you with the tools to define these terms so that your team can implement mission-critical systems with well-understood performance goals. This Refcard also covers: An Overview of Scalability and High Availability, Implementing Scalable Systems, Caching Strategies, Clustering, Redundancy and Fault Tolerance, Hot Tips, and More.
HTML Preview
Scalability & High Availability

Scalability & High Availability

By Eugene Ciurana

OVERVIEW

Scalability, High Availability, and Performance

The terms scalability, high availability, performance, and mission-critical can mean different things to different organizations, or to different departments within an organization. They are often interchanged and create confusion that results in poorly managed expectations, implementation delays, or unrealistic metrics. This Refcard provides you with the tools to define these terms so that your team can implement mission-critical systems with wellunderstood performance goals.

Scalability

It's the property of a system or application to handle bigger amounts of work, or to be easily expanded, in response to increased demand for network, processing, database access or file system resources.

Horizontal scalability
A system scales horizontally, or out, when it's expanded by adding new nodes with identical functionality to existing ones, redistributing the load among all of them. SOA systems and web servers scale out by adding more servers to a load-balanced network so that incoming requests may be distributed among all of them. Cluster is a common term for describing a scaled out processing system.

Clustering

Figure 1:Clustering

Vertical scalability
A system scales vertically, or up, when it's expanded by adding processing, main memory, storage, or network interfaces to a node to satisfy more requests per system. Hosting services companies scale up by increasing the number of processors or

Scalability, continued

the amount of main memory to host more virtual servers in the same hardware.

Virtualization

Figure 2:Virtualization

High Availability

Availability describes how well a system provides useful resources over a set period of time. High availability guarantees an absolute degree of functional continuity within a time window expressed as the relationship between uptime and downtime.

A = 100 – (100*D/U), D ::= unplanned downtime, U ::= uptime; D, U expressed in minutes

Uptime and availability don't mean the same thing. A system may be up for a complete measuring period, but may be unavailable due to network outages or downtime in related support systems. Downtime and unavailability are synonymous.

High Availability, continued

Measuring Availability
Vendors define availability as a given number of "nines" like in Table 1, which also describes the number of minutes or seconds of estimated downtime in relation to the number of minutes in a 365-day year, or 525,600, making U a constant for their marketing purposes.

Availability % Downtime in Minutes Downtime per Year Vendor Jargon
90 52,560.00 36.5 days one nine
99 5,256.00 4 days two nines
99.9 525.60 8.8 hours three nines
99.99 52.56 53 minutes four nines
99.999 5.26 5.3 minutes five nines
99.9999 0.53 32 seconds six nines

Table 1:Availability as a Percentage of Total Yearly Uptime

Analysis
High availability depends on the expected uptime defined for system requirements; don't be misled by vendor figures. The meaning of having a highly available system and its measurable uptime are a direct function of a Service Level Agreement. Availability goes up when factoring planned downtime, such as a monthly 8-hour maintenance window. The cost of each additional nine of availability can grow exponentially. Availability is a function of scaling the systems up or out and implementing system, network, and storage redundancy.

Service Level Agreement (SLA)

SLAs are the negotiated terms that outline the obligations of the two parties involved in delivering and using a system, like:

  • System type (virtual or dedicated servers, shared hosting)
  • Levels of availability
    • Minimum
    • Target
  • Uptime
    • Network
    • Power
    • Maintenance windows
  • Serviceability
  • Performance and Metrics
  • Billing

SLAs can bind obligations between two internal organizations (e.g. the IT and e-commerce departments), or between the organization and an outsourced services provider. The SLA establishes the metrics for evaluating the system performance, and provides the definitions for availability and the scalability targets. It makes no sense to talk about any of these topics unless an SLA is being drawn or one already exists.

IMPLEMENTING SCALABLE SYSTEMS

SLAs determine whether systems must scale up or out. They also drive the growth timeline. A stock trading system must scale in real-time within minimum and maximum availability levels. An e-commerce system, in contrast, may scale in during the "slow" months of the year, and scale out during the retail holiday season to satisfy much larger demand.

Implementing Scalable Systems, continued

Load Balancing

Load balancing is a technique for minimizing response time and maximizing throughput by spreading requests among two or more resources. Load balancers may be implemented in dedicated hardware devices, or in software. Figure 3 shows how load-balanced systems appear to the resource consumers as a single resource exposed through a well-known address. The load balancer is responsible for routing requests to available systems based on a scheduling rule.

Load Balancer

Figure 3: Availability as percentage of Total Yearly Uptime

Scheduling rules are algorithms for determining which server must service a request. Web applications and services are balanced by following round robin scheduling rules. Caching pools are balanced by applying frequency rules and expiration algorithms. Applications where stateless requests arrive with a uniform probability for any number of servers may use a pseudo-random scheduler. Applications like music stores, where some content is statistically more popular, may use asymmetric load balancers to shift the larger number popular requests to higher performance systems, serving the rest of the requests from less powerful systems or clusters.

Persistent Load Balancers
Stateful applications require persistent or sticky load balancing, where a consumer is guaranteed to maintain a session with a specific server from the pool. Figure 4 shows a sticky balancer that maintains sessions from multiple clients. Figure 5 shows how the cluster maintains sessions by sharing data using a database.

Sticky Load Balancer

Figure 4:Sticky Load Balancer

Common Features of a Load Balancer
Asymmetric load distribution – assigns some servers to handle a bigger load than others

  • Content filtering - inbound or outbound
  • Distributed Denial of Services (DDoS) attack protection
  • Firewall
  • Payload switching - send requests to different servers based on URI, port and/or protocol
  • Priority activation - adds standing by servers to the pool

Load Balancing, continued

  • Rate shaping - ability to give different priority to different traffic
  • Scripting - reduces human interaction by implementing programming rules or actions
  • SSL offloading - hardware assisted encryption frees web server resources
  • TCP buffering and offloading - throttle requests to servers in the pool
DatabaseSessions

Figure 5: Database Sessions

CACHING STRATEGIES

Stateful load balancing techniques require data sharing among the service providers. Caching is a technique for sharing data among multiple consumers or servers that are expensive to either compute or fetch. Data are stored and retrieved in a subsystem that provides quick access to a copy of the frequently accessed data.

Caches are implemented as an indexed table where a unique key is used for referencing some datum. Consumers access data by checking (hitting) the cache first and retrieving the datum from it. If it's not there (cache miss), then the costlier retrieval operation takes place and the consumer or a subsystem inserts the datum to the cache.

Write Policy

The cache may become stale if the backing store changes without updating the cache. A write policy for the cache defines how cached data are refreshed. Some common write policies include:

  • Write-through: every write to the cache follows a synchronous write to the backing store
  • Write-behind: updated entries are marked in the cache table as dirty and it's updated only when a dirty datum is requested.
  • No-write allocation: only read requests are cached under the assumption that the data won't change over time but it's expensive to retrieve

Caching Strategies, continued

Application Caching

  • Implicit caching happens when there is little or no programmer participation in implementing the caching. The program executes queries and updates using its native API and the caching layer automatically caches the requests independently of the application. Example: Terracotta (http://www.terracotta.org).
  • Explicit caching happens when the programmer participates in implementing the caching API and may also implement the caching policies. The program must import the caching API into its flow in order to use it. Examples: memcached (http://www.danga.com/memcached) and Oracle Coherence (http://coherence.oracle.com).

In general, implicit caching systems are specific to a platform or language. Terracotta, for example, only works with Java and JVM-hosted languages like Groovy. Explicit caching systems may be used with many programming languages and across multiple platforms at the same time. memcached works with every major programming language, and Coherence works with Java, .Net, and native C++ applications.

Web Caching

Web caching is used for storing documents or portions of documents (‘particles') to reduce server load, bandwidth usage and lag for web applications. Web caching can exist on the browser (user cache) or on the server, the topic of this section. Web caches are invisible to the client may be classified in any of these categories:

  • Web accelerators: they operate on behalf of the server of origin. Used for expediting access to heavy resources, like media files. Content distribution networks (CDNs) are an example of web acceleration caches; Akamai, Amazon S3, Nirvanix are examples of this technology.
  • Proxy caches: they serve requests to a group of clients that may all have access to the same resources. They can be used for content filtering and for reducing bandwidth usage. Squid, Apache, ISA server are examples of this technology.

Distributed Caching

Caching techniques can be implemented across multiple systems that serve requests for multiple consumers and from multiple resources. These are known as distributed caches, like the setup in Figure 7. Akamai is an example of a distributed web cache. memcached is an example of a distributed application cache.

Caching Strategies, continued

Distributed Cache

Figure 7:Distributed Cache

CLUSTERING

A cluster is a group of computer systems that work together to form what appears to the user as a single system. Clusters are deployed to improve services availability or to increase computational or data manipulation performance. In terms of equivalent computing power, a cluster is more costeffective than a monolithic system with the same performance characteristics.

The systems in a cluster are interconnected over high-speed local area networks like gigabit Ethernet, fiber distributed data interface (FDDI), Infiniband, Myrinet, or other technologies.

Load Balancing Cluster

Figure8: Load Balancing Cluster

Load-Balancing Cluster (Active/Active): Distribute the load among multiple back-end, redundant nodes. All nodes in the cluster offer full-service capabilities to the consumers and are active at the same time.

Load Balancing Cluster

High Availability Cluster

Clustering, continued

High Availability Cluster(Active/Passive): Improve services availability by providing uninterrupted service through redundant nodes that eliminate single points of failure. High availability clusters require two nodes at a minimum, a "heartbeat" to detect that all nodes are ready, and a routing mechanism that will automatically switch traffic if the main node fails.

Grid

Figure 10: Grid

Grid: Process workloads defined as independent jobs that don't require data sharing among processes. Storage or network may be shared across all nodes of the grid, but intermediate results have no bearing on other jobs progress or on other nodes in the grid, such as a Cloudera Map Reduce cluster (http://www.cloudera.com).

Figure 11

Figure 11: Computational Clusters

Computational Clusters: Exeute processes that require raw computational power instead of executing transactional operations like web or database clusters. The nodes are tightly coupled, homogeneous, and in close physical proximity. They often replace supercomputers.

REDUNDANCY AND FAULT TOLERANCE

Redundant system design depends on the expectation that any system component failure is independent of failure in the other components.

Fault tolerant systems continue to operate in the event of component or subsystem failure; throughput may decrease but overall system availability remains constant. Faults in hardware or software are handled through component redundancy. Fault tolerance requirements are derived from SLAs. The implementation depends on the hardware and software components, and on the rules by which they interact.

Redundancy and Fault Tolerance, continued

Fault Tolerance SLA Requirements

  • No single point of failure – redundant components ensure continuous operation and allow repairs without disruption of service
  • Fault isolation – problem detection must pinpoint the specific faulty component
  • Fault propagation containment – faults in one component must not cascade to others
  • Reversion mode – set the system back to a known state

Redundant clustered systems can provide higher availability, better throughput, and fault tolerance. The A/A cluster in Figure 12 provides uninterrupted service for a scalable, stateless application.

Figure 12

Figure 12:A/A Full Tolerance and Recovery

Some stateful applications may only scale up; the A/P cluster in Figure 13 provides uninterrupted service and disaster recovery for such an application. A/A configurations provide failure transparency. A/P configurations may provide failure transparency at a much higher cost because automatic failure detection and reconfiguration are implemented through a feedback control system, which is more expensive and trickier to implement.

Figure13

Figure 13:A/P Fault Tolerance and Recovery

Enterprise systems most commonly implement A/P fault tolerance and recovery through fault transparency by diverting services to the passive system and bringing it on-line as soon as possible. Robotics and life-critical systems may implement probabilistic, linear model, fault hiding, and optimizationcontrol systems instead.

Cloud Computing

Cloud computing describes applications running on distributed, computing resources owned and operated by a third-party.

End-user apps are the most common examples. They utilize the Software as a Service (SaaS) and Platform as a Service (PaaS) computing models.

Cloud Computing, continued

Figure 14

Figure 14:Cloud Computing Configuration

Cloud Services Types

  • Web services – Salesforce com, USPS, Google Maps
  • Service platforms – Google App Engine, Amazon Web Services (EC2, S3, Cloud Front), Nirvanix, Akamai, MuleSource

Fault Detection Methods

Fault detection methods must provide enough information to isolate the fault and execute automatic or assisted failover action. Some of the most common fault detection methods include:

  • Built-in Diagnostics
  • Protocol Sniffers
  • Sanity Checks
  • Watchdog Checks

Criticality is defined as the number of consecutive faults reported by two or more detection mechanisms over a fixed time period. A fault detection mechanism is useless if it reports every single glitch (noise) or if it fails to report a real fault over a number of monitoring periods.

SYSTEM PERFORMANCE

Performance refers to the system throughput under a particular workload for a defined period of time. Performance testing validates implementation decisions about the system throughput, scalability, reliability, and resource usage. Performance engineers work with the development and deployment teams to ensure that the system's non-functional requirements like SLAs are implemented as part of the system development lifecycle. System performance encompasses hardware, software, and networking optimizations.

Hot Tip

Performance testing efforts must begin at the same time as the development project and continue through deployment

System Performance, continued

The performance engineer's objective is to detect bottlenecks early and to collaborate with the development and deployment teams on eliminating them.

System Performance Tests

Performance specifications are documented along with the SLA and with the system design. Performance troubleshooting includes these types of testing:

  • Endurance testing - identifies resource leaks under the continuous, expected load.
  • Load testing - determines the system behavior under a specific load.
  • Spike testing - shows how the system operates in response to dramatic changes in load.
  • Stress testing - identifies the breaking point for the application under dramatic load changes for extended periods of time.

System Performance, continued

Software Testing Tools

There are many software performance testing tools in the market. Some of the best are released as open-source software. A comprehensive list of those is available from:

http://www.opensourcetesting.org/performance.php

These include Java, native, PHP, .Net, and other languages and platforms.

Staying Current

Do you want to know about specific projects and cases where scalability, high availability, and performance are the hot topic? Join the scalability newsletter:

http://eugeneciurana.com/scalablesystems

About The Author

Photo of author Eugene Ciurana

Eugene Ciurana

Eugene Ciurana is an open-source evangelist who specializes in the design and implementation of missioncritical, high-availability large scale systems. As Director of Systems Infrastructure, he and his team designed and built a 100% SOA and cloud system that enables millions of Internet-ready educational and handheld products and services. As chief liaison between Walmart.com Global and the ISD Technology Council, he led the official adoption of Linux and other open-source technologies at Walmart Stores Information Systems Division. He's also designed high performance systems for major financial institutions and many Fortune 100 companies in the United States and Europe.

Publications

  • Developing with the Google App Engine
  • Best Of Breed: Building High Quality Systems, Within Budget, On Time, and Without Nonsense
  • The Tesla Testament: A Thriller

Web site
http://eugeneciurana.com

Recommended Book

Google App Engine

Developing with Google App Engine introduces Google App Engine, a platform that provides developers and users with the infrastructure that Google itself uses for developing and deploying massively scalable applications. Using Python as the primary programming tool, Developing with Google App Engine makes it easy to implement scalability and high performance features like distributed databases, clustering, stateless applications, and sophisticated data caching.


Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

Getting Started with JBoss Enterprise Application Platform 5

By Scott Marlow, Jaikiran Pai, Shelly McGowan, Brian Stansberry and Len DiMaggio

15,334 Downloads · Refcard 97 of 151 (see them all)

Download
FREE PDF


The Essential JBoss EAP 5 Cheat Sheet

JBoss Enterprise Application Platform 5 is an open source implementation of the Java EE suite of services. This DZone Refcard provides an in-depth introduction to JBoss Enterprise Application Platform 5. It comprises a set of offerings for enterprise customers who are looking for preconfigured profiles of JBoss Enterprise Middleware components that have been tested and certified together to provide an integrated experience. We take you all the way from installation to the deployment of your application. This Refcard is a must have for both users starting out with Java EE as well as senior architects.
HTML Preview
Getting Started With JBoss Enterprise Application Platform 5

Getting Started with JBoss Enterprise Application Platform 5

By Scott Marlow, Jaikiran Pai, Shelly McGowan, Brian Stansberry, and Len DiMaggio

WHAT IS JBOSS EAP 5?

JBoss Enterprise Application Platform is an open source implementation of the Java EE suite of services. It comprises a set of offerings for enterprise customers who are looking for preconfigured profiles of JBoss Enterprise Middleware components that have been tested and certified together to provide an integrated experience. Its easy-to-use server architecture and high flexibility makes JBoss the ideal choice for users just starting out with Java EE, as well as senior architects looking for a customizable middleware platform.

Because it is Java-based, JBoss Enterprise Application Platform is cross platform, easy to install and use on any operating system that supports Java. The readily available source code is a powerful learning tool to debug the server and understand it. It also gives you the flexibility to create customized versions for your personal or business use.

Visit the http://www.jboss.com/products/community-enterprise/ website to download JBoss Enterprise Application Platform 5.

INSTALLING JBOSS EAP 5

1. Installation using the Graphical Installer

The graphical installer will guide you through the installation steps. Invoke the installer using the command:


java -jar enterprise-installer-5.0.1.GA.jar

2. Installation using the Zip Distribution

Use the unzip command:


unzip jboss-eap-5.0.1.GA.zip

Use the same steps to install optional native package:


unzip jboss-eap-native-5.0.1.GA-<operating-system>-<arch>.zip

DIRECTORY STRUCTURE

Contents of jboss-as:

Directory Description
bin Contains startup, shutdown and other system-specific scripts. All entry point JAR files and start scripts are here.
client JAR files used by external Java client applications. Choose JAR files as required or use jbossall-client.jar
common/lib Shared JAR files common to profiles are here.
docs XML DTD and schemas for reference. Also example configuration files for setting up datasources (e.g. MySQL, Oracle, PostgreSQL).
lib Contains server startup JARs and not intended to hold application JAR files.
server Contains the JBoss server profile sets.

DIRECTORY STRUCTURE

JBoss EAP 5 configuration profiles are located within the jboss-as/server directory (specified with "run -c PROFILE" ):

default JavaEE 5 server profile. Has most frequently used EE services. Default does not include JAXR, IIOP, clustering services.
all Bundles all services (including clustering and RMI/IIOP).
production based on "all" profile, tuned for production; with log verbosity reduced, deployment scanning every 60 seconds, and memory usage tuned to accommodate production deployment requirements, configured to require authorization checks.
standard based on 'all' profile and is a fully certified Java EE 5 configuration.
web lightweight configuration created around JBoss Web and provides services required for web application deployment and a subset of Java EE technologies. Does not include JBoss Transaction JTS or XTS, Enterprise Java Bean 1.x or 2.x capabilities, JBoss Messaging, JCA, or JBoss IIOP
minimal Bare minimum required to start. Includes logging, JNDI and URL deployment scanner. Use JMX/JBoss to start your own services. No web container, EJB or JMS support is included.

ADMINISTRATION

To start the JBoss EAP server, simply change to the EAP_DIST/ jboss-as/bin directory. Set the environment variable JAVA_ HOME. Execute the run.bat (for Windows) or run.sh (for Linux, Unix, Mac OSX) script, as appropriate for your operating system.

Administering your JBoss EAP 5 server instance is easy with the administration consoles provided in this distribution. Once the server is started, simply point your browser to:


http://localhost:8080

This brings you to the consoles to manage your instance as well as links to on-line resource references.

Use the Administration Console for managing and monitoring a server instance. Deploy, undeploy, and update enterprise applications, persist configuration changes for Datasources, JMS topics and queues, Connection Factories, and Service Binding Manager, monitor standard JVM metrics, and view statistics and invoke operations on many other components.

The JMX Console provides a server view. It lists all registered services (MBeans). Administration Console shares the same username/password as JMX console.

Command-line scripts are available in the jboss-as/bin directory. In addition to scripts for starting and stopping the server, JBoss provides a command line tool that allows for interaction with a remote server instance. This tool is called twiddle (for twiddling bits via JMX) and is located in the bin directory. Twiddle is a command execution tool, not a general command shell. Run using the twiddle.sh/twiddle.bat scripts, and passing in a -h(-- help) argument provides the basic syntax, and --help-commands shows commands. Twiddle defaults to the localhost at port 1099 to lookup the default jmx/rmi/RMIAdaptor binding of the RMIAdaptor service as the connector for communicating with the JMX server. To connect to a different server/port combination, can use the -s (--server) option:


$ ./twiddle.sh -s servername serverinfo -d jboss
$ ./twiddle.sh -s servername:1099 serverinfo -d jboss

MANAGEMENT

The JBoss Operations Network (JON) management platform delivers centralized systems management that allows you to:

  • Coordinate stages of application’s life-cycle datasources and messaging services
  • Expose cohesive view of middleware components through complex environments
  • Improve operational efficiency/reliability through visibility into production availability and performance
  • Configure and roll out applications across complex environments through a single tool

APPLICATION DEPLOYMENT

There are multiple ways to deploy applications to JBoss:

Simplest way

  • Choose the server profile to which you want to deploy the application (let's consider the "default" server profile in this example)
  • Copy your application (for example: .war or .ear or a .jar file) to JBOSS_HOME/server/<profilename>/deploy folder.
  • Start the server:

./run.sh

  • JBoss will deploy your application when it boots up.
  • That's it!

Hot Tip

This approach does not require the server to be started when you are deploying your application. If you want to undeploy the application then just move (or delete) the application from the deploy folder. You can develop simple scripts (like an Ant script) to deploy the application to JBoss. All it takes is a file copy command.

Using the admin console

  • Start the server

./run.sh

  • On the left hand side of the admin console page, under JBoss AS -> Applications, select the type of application you want to deploy. Let's consider a web application (.war) in this example
  • Click the "Web Application (WAR)" link
  • In the right side section, under the "Summary" tab, click on "Add a new resource" button

  • "Browse" to the .war file to deploy (e.g. /home/me/myapp.war)
  • If you want to deploy the application in exploded format (instead of an archive) then select "Yes" radio button for "Deploy Exploded" option Click on the "Continue" button
  • On successful deployment, you will see your application listed in the Summary tab
  • To undeploy the application, click on the "Delete" button next to the application you want to undeploy, in the "Summary" tab

Hot deployment

JBoss has a built-in hot deployer which can:

  • Detect new applications in the deploy folder and trigger an application deployment
  • Detect an application which was removed from the deploy folder and trigger an application undeployment
  • Detect that the deployment descriptor of an application (for example, the web.xml of .war or application.xml of .ear) has changed and trigger an application redeployment

Hot Tip

The hot deployer is configured to run every X milliseconds. This value can be changed by changing the "scanPeriod" attribute in JBOSS_HOME/server/lt;profilename>/deploy/ hdscanner-jboss-beans.xml:

<!-- Hotdeployment of applications -->
<bean name="DScanner" class="org.jboss.system.server.profileservice
.hotdeploy.HDScanner">
<property name="deployer"><inject bean="ProfileServiceDeployer"/></property>
<property name="profileService"><inject bean="ProfileService"/></property>
<property name="scanPeriod">5000</property>
<property name="scanThreadName">HDScanner</property>
</bean>

To disable hot deployment, remove the hdscanner-jboss-beans. xml from the deploy folder or rename it to hdscanner-jbossbeans. xml.bak (.bak files are ignored).

Clustering

Getting started with JBoss clustering is very simple. If two JBoss server instances using the all configuration are started on the same network, those servers will detect each other and automatically form a cluster.

Initial Preparation

Preparing a set of servers to act as a JBoss cluster involves a few simple steps:

  • Install JBoss on all your servers.
  • For each node, determine the address to bind sockets to.
  • Ensure multicast is working. Make sure each server's networking configuration supports multicast and that multicast support is enabled for any switches or routers between your servers. JBoss clustering also offers nondefault configuration options that do not use multicast.
  • Determine a unique integer "ServerPeerID" for each node. JBoss Messaging requires that each node in a cluster has a unique integer id, known as a "ServerPeerID", that should remain consistent across server restarts. A simple 1, 2, 3, ..., x naming scheme is fine.

Launching Your Cluster

We'll look at two scenarios for doing this. In each scenario we'll be creating a two node cluster, where the ServerPeerID for the first node is 1 and for the second node is 2.

Scenario 1: Nodes on Separate Machines

On node1, to launch JBoss:


$ ./run.sh -c all -b 192.168.0.101 -Djboss.messaging.ServerPeerID=1

On node2, it's the same except for a different -b value and ServerPeerID:


$ ./run.sh -c all -b 192.168.0.102 -Djboss.messaging.ServerPeerID=2

The -c switch says to use the all config, which includes clustering support. The -b switch sets the address on which sockets will be bound. The -D switch sets the system property from which JBoss Messaging gets its unique id.

Scenario 2: Two Nodes on a Single, Non-Multihomed, Machine

Running multiple nodes on a single machine that only has a single IP address is a common scenario in a development environment. You need to be sure each server instance has its own work area. One way to do this is to simply make copies of the all configuration. For example, assuming the root of the JBoss distribution was unzipped to /var/jboss, you could:


$ cd /var/jboss/server
$ cp -r all node1
$ cp -r all node2

Two processes can't bind sockets to the same address and port, so we'll have to tell JBoss to use different ports for the two instances. This can be done by setting the jboss.service. binding.set system property.

To launch the first instance, open a console window and:


$ ./run.sh -c node1 -b 192.168.0.101 -Djboss.messaging.ServerPeerID=1 \
-Djboss.service.binding.set=ports-default

For the second instance, in a second console window:


$ ./run.sh -c node2 -b 192.168.0.101 -Djboss.messaging.ServerPeerID=2 \
-Djboss.service.binding.set=ports-01

This tells the ServiceBindingManager on the first node to use the standard set of ports (e.g. JNDI on 1099). The second node uses the “ports-01" binding set, with which by default each port has an offset of 100 from the standard port number (e.g. JNDI on 1199).

Web Application Clustering Quick Start

Web application clustering involves two aspects: setting up an HTTP load balancer and telling JBoss to make the application’s user sessions HA. How to do the former depends on what load balancer you choose (mod_cluster is a good choice); the latter couldn't be simpler – just add the <distributable/> to your application’s web.xml.

EJB3 Session Bean Clustering Quick Start

To add load balancing and failover capabilities to your EJB3 session beans, simply add the org.jboss.ejb3.annotation. Clustered annotation to the bean class for your stateful or stateless bean:


@javax.ejb.Stateful
@org.jboss.ejb3.annotation.Clustered
public class MyBean implements MySessionInt {
	public void test() {
		// Do something cool
	}
}

PERFORMANCE AND TUNING

Identify peak application workload and difference from normal workload. In understanding peak workloads, don’t go by averages as the peaks may be much more than the averages calculated over a period. Start testing with a low load and increase until expected peak load. Tune until target performance is achieved. There are a number of possible performance optimizations. Always load test before and after making changes to verify the intended effect. Make one change at a time so it's clear what change has what effect.

Use OS monitoring tools to identify system performance bottlenecks. In a multiple machine installation, find the machine(s) that are the bottleneck.

Instrument the application for performance measurement (make optional for production ). Also, turn on container call statistics and Hibernate statistics.

Taking successive thread dumps indicates what is going on. Do this prior/after hitting a performance wall. Generate a thread dump once a minute over a five minute performance problem and compare your findings. Use "jps -l" command to get the Java process ids and then run the “jstack ProcessID" command (generates the thread dump.)

The HotSpot Java Virtual Machine contains other information gathering tools which can help tune applications. More information is in http://java.sun.com/javase/technologies/hotspot/.

jmap generates a memory heap dump file that can easily be read by the Eclipse Memory Analyzer tool (http://www.eclipse.org/mat/). jstat shows details of the memory space.

Clustering Tuning

Ensuring Adequate Network Buffers

Inadequately sized network buffers can cause lost packets. Steps to increase max buffer sizes are OS specific. For Linux change /etc/sysctl.conf file:


# Allow a 25MB UDP receive buffer for JGroups
net.core.rmem_max = 26214400
# Allow a 1MB UDP send buffer for JGroups
net.core.wmem_max = 1048576

Isolating Intra-Cluster Traffic

isolate intra-cluster traffic from external request traffic. This requires multiple NICs on your server machines, with request traffic coming in on one NIC and intra-cluster traffic using another.


./run.sh -c all -b 10.0.0.104 -Djgroups.bind_addr=192.168.100.104

JGroups Message Bundling

Message bundling queues small messages until a configurable number of bytes have accumulated, then sent as a large message. Use of bundling can have significant performance benefits for high-volume asynchronous session replication. However, it is not enabled by default, as bundling can add significant latency to other types of intra-cluster traffic, particularly clustered Hibernate/JPA Second Level Cache traffic.

To use a JGroups channel with message bundling enabled, edit the <JBoss_Home>/server/<profilename>/deploy/cluster/ jboss-cache-manager.sar/META-INF/jboss-cache-jboss-beans. xml file. For example, for the cache used by default for web sessions:


. . .
<!-- Standard cache used for web sessions -->
<entry><key>standard-session-cache</key>
<value>
	<bean name="StandardSessionCacheConfig" class="org.jboss.cache.config.
	Configuration">
. . .
<!-- Replace standard 'udp' JGroups stack with one that uses message
bundling -->
	<property name="multiplexerStack">udp-async</property>
. . .

For FIELD granularity web sessions, in the same file the same change can be made to the cache configuration with the fieldgranularity- session-cache key. For EJB3 stateful session beans, in the same file the same change can be made to the cache configuration with the sfsb-cache key.

Enabling Buddy Replication for Session Caches

In a cluster of more than two nodes, you can improve performance by enabling "buddy replication" in the web session and stateful session bean caches. With buddy replication, instead of replicating a copy of sessions to all nodes in the cluster, a copy is only replicated to a configurable number of "buddy" nodes.

Buddy replication is enabled by editing the <JBoss_Home>/ server/<profilename>/deploy/cluster/jboss-cache-manager. sar/META-INF/jboss-cache-jboss-beans.xml file. For example, for the cache used by default for web sessions:


. . .
<!-- Standard cache used for web sessions -->
<entry><key>standard-session-cache</key>
<value>
	<bean name="StandardSessionCacheConfig" class="org.jboss.cache.config.
	Configuration">
. . .
<property name="buddyReplicationConfig">
	<bean class="org.jboss.cache.config.BuddyReplicationConfig">
<!-- Just set to true to turn on buddy replication -->
	<property name="enabled">true</property>
. . .

For FIELD granularity web sessions, in the same file the same change can be made to the cache configuration with the fieldgranularity- session-cache key. For EJB3 stateful session beans, in the same file the same change can be made to the cache configuration with the sfsb-cache key.

Reducing the Volume of Web Session Replication

Reducing the amount data being replicated can obviously improve performance. This can be accomplished both by avoiding replication when a request hasn't actually updated the session and by limiting replication to only the session data that has actually changed. See the discussion of replicationtrigger and replication-granularity in "http://www.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/5.0.0/html/Administration_And_Configuration_Guide/clustering-http-state.html" for how to configure your application to limit the amount of data replicated.

Monitoring JGroups via JMX

When clustering services create a JGroups Channel to use for intra-cluster communication, the JMX console will include the following:

jboss.jgroups:cluster=<cluster_name>,protocol=UDP,type=protocol Statistics on two thread pools used to carry incoming messages up the channel’s protocol stack.

jboss.jgroups:cluster=<cluster_name>,protocol=UNICAST,type=protocol Information on lossless, ordered delivery of unicast (i.e. point-to-point) messages.

jboss.jgroups:cluster=<cluster_name>,protocol=NAKACK,type=protocol Information on lossless, ordered delivery of multicast (i.e. point-to-multipoint) messages.

jboss.jgroups:cluster=<cluster_name>,protocol=FC,type=protocol Information on ensuring fast message senders do not overwhelm slow receivers.

Web deployer

Other key configurations required for performance tuning of your Enterprise Application Platform include the <JBoss_ Home>/server/<your_configuration>/deployers/jbossweb. deployer/server.xml file that sets your HTTP requests pool.

Thread pool

JBoss Enterprise Application Platform 5 has a robust thread pooling, that should be sized appropriately. The server has a jboss-service.xml file in the <JBoss_Home>/server/<your_ configuration>/conf directory that defines the system thread pool. There is a setting that defines the behavior if there isn't a thread available in the pool for execution. The default is to allow the calling thread to execute the task. You can monitor the queue depth of the system thread pool through the JMX Console, and determine from that if you need to make the pool larger.

TROUBLESHOOTING

Startup problems

If you are having trouble starting JBoss, the first thing to check is the JAVA_HOME environment variable. This should point to the home of your JDK (or JRE installation). For example, if your JDK is installed at /opt/Java/jdk1.5.0 then JAVA_HOME should be set as follows:


JAVA_HOME=/opt/Java/jdk1.5.0

On Windows OS, if your JDK installation is at C:/Java/jdk1.5.0, you can set it as follows:


set JAVA_HOME=C:/Java/jdkl.5.0

Hot Tip

It's highly recommended not to install JBoss or Java in a folder containing a space in its path. For example, do not install Java at C:/Program Files/Java/jdk1.5.0 on Windows OS.

Logs

JBoss by default is configured to log messages to the JBOSS_HOME/server/<servername>/log server.log file. This file can be checked for any exceptions or other informational logging. Logging levels can be controlled in JBOSS_HOME/ server/<servername>/conf/jboss-log4j.xml

Thread dumps

Sometimes, if you notice that the application is not responding, you can generate thread dumps to check what each thread is currently doing. Thread dumps can be generated in multiple ways - 2 of which are explained below:

From jmx-console

  • Access jmx-console (http://localhost:8080/jmx-console)
  • Look for the jboss.system:type=ServerInfo MBean and click on the link
  • On the page that comes up, look for the listThreadDump operation and click on the Invoke button

Using Twiddle:

  • From the command prompt, cd to JBOSS_HOME/bin folder
  • Run the following command:
    • ./twiddle.sh invoke "jboss.system:type=ServerInfo" listThreadDump > threads.html
    • Note: Use twiddle.bat for Windows OS

This command will generate the thread dump and redirect the output to threads.html (you can redirect it to any file of your choice)

It's best to generate multiple thread dumps between a span of few seconds and compare those thread dumps to find any blocked threads.

Reporting problems

Also, search the community forums to see if someone else is experiencing the same problem. The forums are at link http://community.jboss.org/. You can also obtain a support contract via http://www.jboss.com/services/subscriptions/ and then access https://www.redhat.com/wapps/sso/jboss/login.html?redirect=http%3A%2F%2Fsupport.redhat.com%2Fjbossnetwork

Search for existing issues that also report the same problem. Access the JIRA issue search via link https://jira.jboss.org/jira/secure/IssueNavigator.jspa

When filling in the JIRA, be as precise as possible when reporting the bug. Include as much information as possible. Include the steps needed to reproduce the problem.

If possible, create a standalone test case that reproduces the bug that can be attached to a JIRA issue. The JIRA issue is more likely to be fixed if a unit test is attached (or at least a test case). Even better is if a solution, in the form of a patch (output of doing "svn diff > fix.patch"), is attached.

JBOSS EAP 5 VS AS 5

The focus of the JBoss AS project is continuous innovation - fueled by Open Source community collaboration - on the bleeding edge of Java Middleware. Quickly bringing emerging standards and technology into the mainstream through it's large user base and active community. This focus implies continuous change and a rapid release cycle with minor releases every one or two months and major releases every six months. Red Hat does not provide support for JBoss community projects.

Red Hat does provide world-class support, consulting and training for the JBoss Enterprise Platforms. JBoss Enterprise Platforms balance innovation with enterprise class stability by integrating the best of Open Source projects like JBoss AS. The JBoss Enterprise Platforms are certified against a broad range of Operating Systems, Databases and other 3rd party applications and tools; meet very strict industry security standards; are pre-tuned and secured by default so are ready to support your business critical applications and services.

Red Hat recommends JBoss projects as a place to get involved in shaping the Open Source middleware landscape and as a way to understand how the technology landscape is evolving. Red Hat recommends JBoss Enterprise Platforms for demanding business critical production workloads where security, performance, reliability and long-term, world-class support are imperative.

See this link for information on training http://www.jboss.com/services/training/

About The Authors

Scott Marlow

Core Engineer on the JBoss AS team. Over 20 years experience building enterprise development software, from database server to developer tools (such as PowerBuilder and four different application servers). Five years experience contributing to JBoss OSS projects { Application Server, Clustering, JGroups, JBoss Cache, Hibernate }. Scott enjoys coaching and playing soccer in his spare time.

Jaikiran Pai

Employed at RedHat and is part of the JBoss EJB3 development team. Jaikiran completed his graduation in 2004 and started working in a software company in Pune, India. In his role as a software developer, he was part of projects which involved Java and JavaEE. During this period, he developed interest in JBoss Application Server and started spending his spare time in JBoss community forums. In 2009, Jaikiran was offered a job at RedHat to be part of his favourite project - JBoss Application Server.

Jaikiran can often be found either at the JBoss forums or at his other favourite place http://www.javaranch.com/. Occasionally, Jaikiran blogs at http://www.jaitechwriteups.blogspot.com/

Shelly McGowan

Member of the JBoss Application Server development team. She has several years of software development experience most recently on Java Enterprise Edition technologies such as EJB and EJB 3 persistence.

Brian Stansberry

My background is in International Business and East Asian Studies, with a B.A. from Michigan State and an M.A. from Stanford. Before getting bitten by the software bug, I had a successful career in corporate finance in the semiconductor industry. Part of that oddly enough involved web application and other types of software development. But since I realized in the late 1990s that my true interest was in software, not finance, I've focused on server-side development and Java. I started working on JBoss in 2003 and joined the company in 2005. My other main interests are China (I speak Mandarin Chinese and visit China regularly) and hanging out with my family. I live in St. Louis, MO.

Expertise: JBoss AS Clustering JBoss AS in general JBoss Cache PojoCache JGroups mod_cluster

Occupation: Lead, JBoss AS Clustering

Len DiMaggio

JBoss middleware QE engineer and team lead. Len is a frequent contributor to JBoss blogs and DZone (http://soa.dzone.com/users/ldimaggi).

Recommended Book

JBoss AS5 Development

This book will kick-start your productivity and help you to master JBoss AS development. The author's experience with JBoss enables him to share insights on JBoss AS development, in a clear and friendly way. By the end of the book, you will have the confidence to apply all the newest programming techniques to your JBoss applications.



JBoss in Action

JBoss in Action is the first book to focus on teaching readers in detail how to use the JBoss application server. Unlike other titles about JBoss, the authors of JBoss in Action go deeper into the advanced features and configuration of the server. In particular, it focuses on enterprise-class topics, such as high availability, security, and performance.


Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

Designing Quality Software

Architectural and Technical Best Practices

By Alexander von Zitzwitz

23,476 Downloads · Refcard 130 of 151 (see them all)

Download
FREE PDF


The Essential Software Quality Cheat Sheet

Designing quality Software is covered in this cheat sheet, more specifically on the topic of large scale system design. The intention is to support architects and developers in solving typical day-to-day issues that can negatively impact technical quality and software structure. You’ll learn about coupling metrics, dependency management, and logical architecture. The card will also provide a comprehensive list of design rules, programming rules, testing rules, environment rules, and common sense rules that you can put into practice right now to start building better software.
HTML Preview
Designing Quality Software Architectural and Technical Best Practices

Designing Quality Software: Architectural and Technical Best Practices

By Alexander von Zitzewitz

ABSTRACT

The technical quality of software can be defined as the level of conformance of a software system to a set a set of rules and guidelines derived from common sense and best practices. Those rules should cover software architecture (dependency structure), programming in general, testing and coding style.

Technical quality is fundamentally manifested in the source code. People say: “The truth can only be found in the source code”. Therefore it is important that achieving a satisfactory level of technical quality is an explicit goal and integral part of the development process. To avoid a steady decrease of technical quality during development it is required to measure it on a regular base (at least daily). By doing that it is possible to detect and address undesirable rule violations early in the process. The later rule violations are detected the more difficult and expensive it is to fix them. Since testing is only one of several aspects of technical quality management it is not possible to achieve an acceptable level of technical quality by testing only.

The document begins with a description of the biggest enemy of technical quality, which is the structural erosion of software. The best way to fight structural erosion is to keep the large-scale structure of a software system in good shape. Therefore the biggest part of this document focuses on large-scale system design, which also has big implications for application security aspects. Parts of this section are very technical. The intention is to support architects and developers in solving typical day-to-day issues that can negatively impact technical quality and software structure. The last part contains a compact set of rules derived from experience and real-world projects. Implementing and enforcing these rules will help you to achieve a good level of technical quality and maintainability while optimizing the productivity of your development team.

The intended audiences are software architects, developers, quality managers and other technical stakeholders. Although the major part of the document is programming language agnostic, the rule set at the end works best with statically typed object-oriented languages like Java, C# or C++.

STRUCTURAL EROSION

cycle groups

All software projects start with great hope and ambition. Architects and developers are committed to creating an elegant and efficient piece of software that is easy to maintain and fun to work on. Usually, they have a vital image of the intended design in their mind. As the code base gets larger, however, things start to change. The software is increasingly harder to test, understand, maintain and extend. In Robert C. Martin’s terms, “The software starts to rot like a piece of bad meat”.

This phenomenon is called “Structural Erosion” or “Accumulation of Structural Debt”, and it happens in almost every non-trivial software project. Usually, the erosion begins with minor deviations from the intended design due to changes in requirements, time pressure or just simple negligence. In the early stages of a project, this is not a problem; but during the later stages, the structural debt grows much faster than the code base. As a result of this process, it becomes much harder to apply changes to the system without breaking something. Productivity is decreasing significantly and the cost of change grows continuously up to a point where it becomes unbearable.

Robert C. Martin described a couple of well-known symptoms that can help you to figure out whether or not your application is affected by structural erosion:

  • Rigidity: the system is hard to change because every change forces many other changes.
  • Fragility: changes cause the system to break in conceptually unrelated places.
  • Immobility: it’s hard to disentangle the system into reusable components.
  • Viscosity: doing things correctly is harder than doing things incorrectly.
  • Opacity: the code is hard to read and understand. It does not express its intent well.

You would probably agree that those symptoms affect most non-trivial software systems in one way or another. Moreover, the symptoms get more severe the older a system is and the more people are working on it. The only way to avoid them in the first place is to have a battle plan against structural erosion integrated into the daily development process.

LARGE-SCALE SYSTEM DESIGN

Dependency Management

The large-scale design of a software system is manifested by its dependency structure. Only by explicitly managing dependencies over the complete software lifecycle is it possible to avoid the negative side effects of structural erosion. One important aspect of dependency management is to avoid cyclic compile-time dependencies between software components:

case 1 and 2

Case 1 shows a cyclic dependency between units A, B and C. Hence, it is not possible to assign level numbers to the units, leading to the following undesirable consequences:

  • Understanding the functionality behind a unit is only possible by understanding all units.
  • The test of a single unit implies the test of all units.
  • Reuse is limited to only one alternative: to r euse all units. This kind of tight coupling is one of the reasons why reuse of software components is hardly ever practiced.
  • Fixing an error in one unit involves automatically the whole gr oup of the three units.
  • An impact analysis of planned changes is difficult.

Case 2 represents three units forming an acyclic directed dependency graph. It is now possible to assign level numbers. The following effects are the consequences:

  • A clear understanding of the units is achieved by having a clea r order, first A, then B and then C.
  • A clear testing order is obvious: first test unit A; test continues with B and afterwards with C.
  • In matter of reuse, it is possible to r euse A isolated, A and B, or also the complete solution.
  • To fix a problem in unit A, it can be tested in isolation, wher eby the test verifies that the error is actually repaired. For testing unit B, only units B and A are needed. Subsequently, real integration tests can be done.
  • An impact analysis can easily be done.

Please keep in mind that this is a very simple example. Many software systems have hundreds of units. The more units you have, the more important it becomes to be able to levelize the dependency graph. Otherwise, maintenance becomes a nightmare.

Hot Tip

Here is what recognized software architecture experts say about dependency management:

“It is the dependency architecture that is degrading, and with it the ability of the software to be maintained.” [ASD]

“The dependencies between packages must not form cycles.” [ASD]

“Guideline: No Cycles between Packages. If a group of packages have cyclic dependencies then they may need to be treated as one larger package in terms of a release unit. This is undesirable because releasing larger packages (or package aggregates) increases the likelihood of affecting something.” [AUP]

“Cyclic physical dependencies among components inhibit understanding, testing and reuse.” [LSD]

Coupling Metrics

Another important goal of dependency management is to minimize the overall coupling between different parts of the software. Lower coupling means higher flexibility, better testability, better maintainability and better comprehensibility. Moreover, lower coupling also means that changes only affect a smaller part of an application, which greatly reduces the probability for regression bugs.

To control coupling, it is necessary to measure it. [LSD] describes two useful coupling metrics. Average Component Dependency (ACD) is telling us on how many components a randomly picked component will depend upon on average (including itself). Normalized Cumulative Component Dependency (NCCD) is comparing the coupling of a dependency graph (application) with the coupling of a balanced binary tree.

Graph 1 and 2

Above, you see two dependency graphs. The numbers inside of the components reflect the number of components reachable from the given component (including itself). The value is called Component Dependency (CD). If you add up all the numbers in the Graph 1 the sum is 23. This value is called “Cumulative Component Dependency” (CCD). If you divide CCD by the number of components in the graph, you get ACD. For Graph 1, this value would be 3.29.

Please note that Graph 1 contains a cyclic dependency. In Graph 2, removing the dependency shown in red has broken the cycle, which reduces the CCD to 19 and ACD to 2.71. As you can see, breaking cycles definitely helps to achieve our second goal, which is the overall reduction of coupling.

NCCD is calculated by dividing the CCD value of a dependency graph through the CCD value of a balanced binary tree with the same number of nodes. Its advantage over ACD is that the metric value does not need to be put in relation to the number of nodes in the graph. An ACD of 50 is high for a system with 100 elements but quite low for a system with 1,000 elements.

Detecting and Breaking Cyclic Dependencies

Agreeing that it is a good idea to avoid cyclic compile-time dependencies is one thing. Finding and br eaking them is another story.

The only real option to find them is to use a dependency analysis tool. For Java, there is a simple free tool called “JDepend” [JDP]. If your project is not very big, you can also use the free “Community Edition” of “SonarJ” [SON], which is much more powerful than JDepend. For bigger pr ojects you need to buy a commercial license of SonarJ. If you are not using Java or look for more sophisticated features like cycle visualization and breakup proposals, you will have to look at commercial tools.

After having found a cyclic dependency, you have to decide how to break it. Code refactorings can break any cyclic compile-time dependency between components. The most frequently used refactoring to do that is the addition of an interface. The following example shows an undesirable cyclic dependency between the “UI” component and the “Model” component of an application:

example1

The example above shows a cyclic dependency between “UI” and “Model”.Now it is not possible to compile, use, test or understand the “Model” component without also having access to the “UI” component. Note that even though there is a cyclic dependency on the component level, there is no cyclic dependency on the type level.

Adding the interface “IAlarmHander” to the “Model” component solves the problem, as shown in the next diagram:

example_02

Now, the class “AlarmHandler” simply implements the interface defined in the “Model” component. The direction of the dependency is inverted by replacing a “uses” dependency with an inverted “implements” dependency. That is why this technique is also called the “dependency inversion principle”, first described by Robert C. Martin [ASD]. Now, it is possible to compile, test and comprehend the “Model” component in isolation. Moreover, it is possible to reuse the component by just implementing the “IAlarmHandler” interface. Please note that even if this method works pretty well most of the time, the overuse of interfaces and callbacks can also have undesirable side effects like added complexity. Therefore, the next example shows another way to break cycles. In [LSD], you will find several additional programming techniques to break cyclic dependencies.

Hot Tip

In C++, you can mimic interfaces by writing a class that contains pure virtual functions only.

Sometimes, you can break cycles by rearranging features of classes. The following diagram shows a typical case:

example_08

The “Order” class references the “Customer” class. The “Customer” class also references the “Order” class over the return value of a convenience method “listOr ders()”. Since both classes are in different packages, this creates an undesirable cyclic package dependency.

example_09

The problem is solved by moving the convenience method to the “Order” class (while converting it into a static method). In situations like this, it is helpful to levelize the components involved in the cycle. In the example, it is quite natural to assume that an order is a higher-level object than a customer. Orders need to know the customer, but customers do not need orders. As soon as levels are established, you simply need to cut all dependencies from lower-level objects to higherlevel objects. In our example, that is the dependency from “Customer” to “Order”.

It is important to mention that we do not look at runtime (dynamic) dependencies here. For the purpose of lar ge-scale system design, only compile-time (static) dependencies are relevant.

Hot Tip

The usage of Inversion of Control (IOC) frameworks like the Spring Framework [SPG] will make it much easier to avoid cyclic dependencies and to reduce coupling.

Logical Architecture

Actively managing dependencies requires the definition of a logical architecture for a software system. A logical architecture groups the physical (programming language) level elements like classes, interfaces or packages (directories or name spaces in C# and C++) into higher-level architectural artifacts like layers, subsystems or vertical slices.

A logical architecture defines those artifacts, the mapping of physical elements (types, packages, etc.) to those artifacts and the allowed and forbidden dependencies between the architectural artifacts.

example_10

Example of a logical architecture with layers and slices

Here is a list of architectural artifacts you can use to describe the logical architecture of your application:

Layer You cut your application into horizontal slices (layers) by using technical criteria. Typical layer names would be “User Interface”, “Service”, “DAO”, etc.
Vertical slice While many applications use horizontal layering, most software architects neglect the clear definition of vertical slices. Functional aspects should determine the vertical organization of your application. Typical slice names would be “Customer”, “Contract”, “Framework”, etc.
Subsystem A subsystem is the smallest of the architectural artifacts. It groups together all types implementing a specific mostly technical functionality. Typical subsystem names would be “Logging”, “Authentication”, etc. Subsystems can be nested in layers and slices.
Natural Subsystem The intersection between a layer and a slice is called a natural subsystem.
Subproject Sometimes projects can be grouped into several inter-related subprojects. Subprojects are useful to organize a large project on the highest level of abstraction. It is recommended not to have more than seven to ten subprojects in a project.

You can nest layers and slices, if necessary. However, for reasons of simplicity, it is not recommended using more than one level of nesting.

You can nest layers and slices, if necessary. However, for reasons of simplicity, it is not recommended using more than one level of nesting.

Mapping of code to architectural artifacts

To simplify code navigation and the mapping of physical entities (types, classes, packages) to architectural artifacts, it is highly recommended to use a strict naming convention for packages (namespaces or directories in C++ or C#). A proven best practice is to embed the name of architectural artifacts in the package name.

For example, you could use the following naming convention:


com.company.project.[subproject].slice.layer.[subsystem]…

Parts in square brackets are optional. For subsystems not belonging to any layer or slice, you can use:


com.company.project.[subproject].subsystem…

Of course, you need to adapt this naming convention if you use nesting of layers or slices.

Hot Tip

Dangerous Attitude: “If it ain’t broken, don’t fix it!” Critics of dependency and quality management usually use the above statement to portray active dependency and quality management as a waste of time and money. Their argumentation is that there is no immediate benefit in spending time and resources to fix rule violations just for improving the inner quality of an application. It is hard to argue against that if you have a very short-time horizon. But if you expand the time horizon to the lifetime of an application, technical quality is the most important factor driving developer productivity and maintenance cost. This shortsighted thinking is one of the major reasons why so many medium- to large-scale applications are so hard to maintain. Many costly project failures can also be clearly associated with lack of technical quality.

Application Security Aspects

Most people don’t think about the connection between application security and the architecture (dependency structure) of an application. But experience shows that potential security vulnerabilities are much more frequent in applications that suffer from structural erosion. The reason for that is quite obvious: if the dependency structure is broken and full of cycles, it is much harder to follow the flow of tainted data (un-trusted data coming from the outside) inside of the application. Therefore, it is also much harder to verify whether or not these data have been properly validated before they are being processed by the application.

On the other hand, if your application has a well-defined logical architecture that is reflected by the code, you can combine architectural and security aspects by designating architectural elements as safe or unsafe. “Safe” means that no tainted data are allowed within this particular artifact. “Unsafe” means that data flowing through the artifact is potentially tainted. To make an element safe, you need to ensure two things:

  • The safe element should not call any API’s that return potentially tainted data (IO, database access, HTTP session access etc.). I f this should be necessary for any reason all data returned by those API’s must be validated.
  • All entry points must be pr otected by data validation.

This is much easier to check and enforce (with a dependency management tool) than having to assume that the whole code base is potentially unsafe. The dependency management tool plays an important role in ensuring the safety of an element by verifying that all incoming dependencies only use the official entry points. Incoming dependencies bypassing those entry points would be marked as violations.

Of course, the actual data pr ocessing should only be done in “safe” architectural elements. Typically, you would consider the Web layer as “unsafe”, while the layers containing the business logic should all be “safe” layers.

Since many applications are suffering from more or less severe structural erosion, it is quite difficult to harden them against potential security threats. In that case, you can either try to reduce the structural erosion and create a “safe” processing kernel using a dependency management tool or rely on expensive commercial software security analysis tools specialized on finding potential vulnerabilities. While the first approach will cost you more time and effort in the short term, it will pay off nicely by actually improving the maintainability and security of the code. The second approach is more like a short-term patch that does not resolve the underlying cause, which is the structural erosion of the code base.

COMMON SENSE RULES

The best way to achieve a high level of technical quality is the combination of a small set of rules and an automated-toolbased approach to rule checking and enforcement (see rule T2 for recommendations. In general, the rules should be checked automatically at least during the nightly build. If possible, a rule checker should also be part of the developer environment, so that developers can detect rule violations even before committing changes to the VCS [SON].

The recommended set of rules is, therefore, minimalistic by intention and can be customized when needed. Experience shows that it is always a good idea to keep the number of rules small because that makes it much easier to check and enforce the rules in the development process. The more rules you add, the less additional benefit will be provided by each additional rule. The rule set presented here is based on common sense and experience and already has been successfully implemented by many software development teams.

Unfortunately this document does not leave enough space to explain the rules in more detail. Please refer to the reference section at the end for more information. Some of the rules might seem arbitrary. In that case you can assume that they are derived from common sense and best practices. And of course you are free to adjust thresholds and rules to better match your specific environment.

Rules fall into three priority classes:

Major Rule Must always be followed.
Minor Rule It is highly recommended to follow this rule. If this is not possible or desirable you must document the reason.
Guideline It is recommended to follow this rule.

DESIGN RULES

These rules are covering large-scale architectural aspects of the system.

Major Rules
D1: Define a cycle free logical architecture for your application Only by having a well-defined and cycle-fr ee application can you have a chance to avoid structural erosion in the first place.
D2: Define a strict and clear naming convention for types and packages based on your logical architecture The naming convention also defines the mapping between your code and the logical architecture and will greatly simplify the code navigation and comprehension. In C++ or C# you should replace package with namespace or directory.
D3: The code must respect the logical architecture This rule is ideally enforced by a tool. Basically, the tool has to ensure that all dependencies in your application conform to the logical architecture defined in D1.
D4: Package dependencies must not form cycles The undesirable effects of cyclic dependencies have been discussed in detail before.
D5: NCCD of compilation units must not be bigger than 7 This rule corresponds with our goal to keep coupling small. If this value grows over the threshold, you should isolate layers and subsystem by only letting them have interfaces as entry points. Breaking cyclic dependencies can also shrink this metric considerably.
Minor Rules
D6: Keep security aspects in mind when creating a logical architecture Plan for application security from the beginning. Designate “safe” and “unsafe” (data are potentially tainted) architectural elements. Keep the boundary between safe and unsafe elements as narrow as possible so that it is easy to verify that all incoming data are validated properly.
D7: Separate technical aspects from domain aspects on the logical architecture level Separating these two aspects is the most promising approach to maintain healthy software. Technical aspects may shift in the near future. Business abstractions and their related logic are more likely to be stable. The Spring Framework implements a very good approach to separate business aspects from technical aspects [SPG].
D8: Use consistent handling of exceptions Exception handling should be done in a consistent way by having answers for basic questions like “What are exceptions?”, “What information about errors should be written and where to?”. Low-level exceptions should not be visible in non-technical layers. Instead, they should be semantically transposed corresponding to their level. This can also prevent tight coupling to implementation details.
Guidlines
D9: Dependencies between compilation units must not form cycles The dependencies must not form cycles. A general discussion is provided in [LSD].
D10: Use design patterns and architectural styles Design patterns and architectural styles reuse proven and tested concepts. Design patterns also establish a standardized language for common design situations. Therefore, the use of design patterns is highly recommended where possible and useful [DES].
D11: Do not reinvent the wheel Use existing designs and implementations where possible. Sometimes it is not obvious at first sight how many errors you can produce with your own implementation. Every line of code not written is a criterion of quality of the system and makes maintenance easier.

PROGRAMMING RULES

Major Rules
P2: Declare class and instance variables as private All modifiable or non-primitive class and instance variables are to be defined as private. This enhances the separation between interface and implementation [LSD].
Minor Rules
P3: Never catch “Throwable” or “Error” (Java) To catch exceptions of type “Throwable” and “Error” (including subclasses) violates the basic idea of the design of J2SE. Only pr ovide exception handling for the type Exception.
P4: Avoid empty catch blocks Empty catch blocks inhibit a useful error handling. At a minimum, a comment and perhaps a configurable log output is required in situations where it is uncritical if the specified exception is caught. The system should remain in a legal state.
P5: Limit the access to types and methods To declare all types and methods as public is easy but maybe not what you want. Only make types and methods visible if they are supposed to be seen from the outside [LSD].
P6: Restrict extendibility - use final for types and methods (Java, C#) The final keyword states that the class is not to be intended for sub-classing. In the case of methods, it is clear that they should not be overwritten. By default, everything should be final. Make everything final unless you explicitly want to allow overriding of behavior by sub-classing.
P7: Provide a minimal documentation for types Focus on the description of the r esponsibilities of types. If it is possible to easily and precisely phrase the responsibilities, then this is a clear indicator for an adequate abstraction. See also the “Single Responsibility Principle” [ASD].
P8: Number of types in a package must not exceed 50 Grouping types together with somehow related responsibilities helps maintaining a clear physical structure. A package is a cohesive unit of physical design with an overall responsibility. Overloaded packages have a good chance to cause excessive cyc les in the physical design.
P9: Lines of code (compilation unit) must not exceed 700 Large compilation units are hard to maintain. Furthermore, they often violate the idea of clear abstractions and lead to significantly increased coupling.
P10: Number of method parameters must not exceed 7 A high number of method parameters may be an indicator of procedural design. The pure number of possible parameter combinations may result in complex method implementations.
P11: Cyclomatic Complexity must not exceed 20 The Cyclomatic Complexity (CCN) specifies the possible control paths through a method. If a method has a lower CCN, it is easier to understand and to test. See [CCN] for formal definition.
P12: Use assertions Use “assert” (Debug.Assert for C#) in order to ensure preconditions, post-conditions and invariants in the “Design by contract” style [TOS]. It is a lso important to verify that assertions are never used to validate data coming from the user or from external systems.

TEST AND ENVIRONMENT RULES

Major Rules
T1: Use a version control system This rule should speak for itself. It is impossible to write reliable software without being able to track changes and synchronize changes.
T2: Set up a build server and measure rule compliance Building your system should be possible completely and independently from your IDE. For Java, we r ecommend the use of Maven, Ivy or ANT. Integrate as many rule-checkers as possible into your build script, so that the rules mentioned here can be checked completely automatically. Structural checks have higher priority than other checks because structural problems are much harder to repair once they spread over your application. Ideally severe rule violations should break the build.
A popular recommendation for setting up an automated build environment is the usage of the Hudson build server [HUD] together with Sonar [SNR] and SonarJ [SON]. Hudson is pr ogramming language agnostic while Sonar is currently expanding its support for other languages. A free SonarJ plug-in is available for Sonar.
T3: Write unit tests together with your code Additionally, make sure that all unit tests are at least executed during the nightly build, ideally with every build. This way, you get early feedback when changes lead to regression bugs. While executing the tests, test tools usually also measure your test coverage. Make sure that all complex parts of your application are covered by tests.
T4: Define tests based on the logical architecture Test design should consider the overall logical architecture. The creation of unit tests for all “Data Transfer Objects” instead of testing classes that provide business logic is useless. A project should establish clear rules on what has to be tested as a minimum instead of doing “blind” test creation.
Recommend rules are:
  • Provide unit tests for all business r elated objects. We want to test the business logic in isolation.
  • Pr ovide unit tests for published interfaces.
The overall goal is to have good direct and indirect test coverage.
Minor Rules
T5: Use collaboration tools like issue trackers and wikis Use an issue tracker to track problems and planned changes. Document all major design concepts and abstractions of your application in a wiki.

CONCLUSION

If you are beginning a new project, work on an existing project, or wanting to improve the development process in your organization, this Refcard is meant to be a good starting point. You can expect significant improvement with regard to developer productivity, application maintainability and technical quality, if you implement and enforce the majority of the rules described above. Although this will cost you effort in the beginning, the overall savings are much bigger than the initial effort. Therefore, the adoption of design and quality rules is not only “nice to have” but also mandatory for every professional software development organization.

References

[ASD] Agile Software Development, Robert C. Martin, Prentice Hall 2003
[AUP] Applying UML And Patterns, Craig Larman, Prentice Hall 2002
[LSD] Large-Scale C++ Software Design, John Lakos, Addison-Wesley 1996
[DES] Design Patterns, Erich Gamma, Richard Helm, Ralph Johnson, John
Vlissides, Addison-Wesley 1994
[TOS] Testing Object-Oriented Systems, Beizer, Addison-Wesley 2000
[JDP] http://www.clarkware.com/software/JDepend.html
[SON] http://www.hello2morrow.com/products/sonarj
[SPG] http://www.springsource.org
[CCN] http://en.wikipedia.org/wiki/Cyclomatic_complexity
[HUD] http://hudson-ci.org
[SNR] http://www.sonarsource.org

About The Authors

Alexander von Zitzewitz

Alexander von Zitzewitz

Alexander von Zitzewitz is the founder, managing director of hello2morrow GmbH and CEO of the US subsidiary. He has more than 20 years of project and management experience. In 1993 he founded ootec—a company focused on project services around object-oriented software technology. During this time, he worked as lead architect on several medium to large C++ and Java projects. This company was sold to the French Valtech group in March 2000 and is serving customers like Siemens, BMW, Thyssen-Krupp-Stahl and other wellknown names in German industry. From 2003 to early 2005, he was working as Director of Central Europe for a French software vendor. In early 2005, he founded hello2morrow in Germany with the vision to create a new product for managing architecture and technical quality of software systems written in Java. The first version of this product called “SonarJ” was released in late summer 2005. Since the summer of 2008, he has been living in Massachusetts. His areas of expertise are object-oriented system design, integrating technical quality into software development processes and large-scale system architecture. Alexander has a degree in Computer Science from the Technical University of Munich.

Recommended Book

Design Patterns

Written by a software developer for software developers, this book is a unique collection of the latest software development methods. The author includes OOD, UML, Design Patterns, Agile and XP methods with a detailed description of a complete software design for reusable programs in C++ and Java. Using a practical, problemsolving approach, it shows how to develop an objectoriented application—from the early stages of analysis, through the low-level design and into the implementation. Walks readers through the designer’s thoughts — showing the errors, blind alleys, and creative insights that occur throughout the software design process.

Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

REST

Foundations of RESTful Architecture

By Brian Sletten

26,556 Downloads · Refcard 129 of 151 (see them all)

Download
FREE PDF


The Essential REST Cheat Sheet

The Representational State Transfer (REST) architectural style is covered in this cheat sheet, a worldview that elevates information into a first-class element of the architectures that we build. This Refcard presents the foundations for RESTful style architecture along with a nice summary of Dr. Roy Fieldings thesis which sparked the REST and Service-Oriented Architecture (SOA) movement. It includes information about the Richardson Maturity Model, the verbs for RESTful systems and how theyre used, Response Codes and a treasure trove of REST resources. Youll also learn about the clear differences between REST and SOAP.
HTML Preview
REST:Foundations of RESTful Architecture

REST: Foundations of RESTful Architecture

By Brian Sletten

INTRODUCTION

The Representational State Transfer (REST) architectural style is not a technology you can purchase or a library you can add to your software development project. It is a worldview that elevates information into a first class element of the architectures we build.

The ideas and terms we use to describe “RESTful” systems were introduced and collated in Dr. Roy Fielding’s thesis, “Architectural Styles and the Design of Network-based Software Architectures”. This document is academic and uses formal language, but remains accessible and provides the basis for the practice.

The summary of the approach is that by making specific architectural choices, we can elicit desirable properties from the systems we deploy. The constraints detailed in this architectural style are not intended to be used everywhere but they are widely applicable.

The concepts are well demonstrated in a reference implementation we call The Web. Advocates of the REST style are basically encouraging organizations to apply the same principles to coarsely granular information sources within their firewalls as they do to external facing customers with web pages.

THE BASICS

A Uniform Resource Locator (URL) is used to identify and expose a “RESTful service”. This is a logical name that separates the identity of an information resource from what is accepted or returned from the service when it is invoked. The URL scheme is defined in RFC 1738.

A sample RESTful URL might be something like the following fake API for a library:


http://fakelibrary.org/library

The URL functions as a handle for the resource, something that can be requested, updated or deleted.

This starting point would be published somewhere as the way to begin interacting with the library’s REST services. What is returned could be XML, JSON or, more appropriately, a hypermedia format such as Atom or a custom MIME type. The general guidance is to reuse existing formats where possible, but there is a growing tolerance for properly designed media types.

To request the resource, a client would issue a Hypertext Transfer Protocol (HTTP) GET request to retrieve it. This is what happens when you type a URL into a browser and hit return, select a bookmark or click through an anchor reference link.

For programmatic interaction with a RESTful API, any of a dozen or more client side APIs or tools could be used. To use the curl command line tool, you could type something like:


HHood> curl http://fakelibrary.org/library

This will return the default representation on the command line. You may not want the information in this form, however. Fortunately, HTTP has a mechanism by which you can ask for information in a different form. By specifying an “Accept” header in the request, if the server supports that representation, it will return it. This is known as content negotiation and is one of more underused aspects of HTTP. Again, using curl, this could be done with:


HHood> curl –H “Accept:application/json” http://fakelibrary.org/
library

This ability to ask for information in different forms is possible because of the separation of the name of the thing from its form. The ‘R’ in REST is ‘representation’, not ‘resource’. Keep this in mind and build systems that allow clients to ask for information in the forms they want. We will revisit this topic later.

Possible URLs for our fake library might include: http://fakelibrary.org/library: general information about the library and the basis for discovering links to search for specific books, DVDs, etc.

http://fakelibrary.org/book: an “information space” for books. Conceptually, it is a placeholder for all possible books. Clearly, if it were resolved, we would not want to return all possible books, but it might perhaps return a way to discover books through categories, keyword search, etc.

http://fakelibrary.org/book/category/1234; within the information space for books, we might imagine browsing them based on particular categories (e.g. adult fiction, children’s books, gardening, etc.) It might make sense to use the Dewey Decimal system for this, but we can also imagine custom groupings as well. The point is that this “information space” is potentially infinite and driven by what kind of information people will actually care about.

http://fakelibrary.org/book/isbn/978-0596801687; a reference to a particular book. Resolving it should include information about the title, author, publisher, number of copies in the system, number of copies available, etc.

These URLs mentioned above will probably be read-only as far as the library patrons are concerned, but applications used by librarians might actually manipulate these resources.

For instance, to add a new book, we might imagine POSTing an XML representation to the main /book information space. In curl, this might look like:


HHood> curl –u username:password-d @book.xml -H “Content-type: text/
xml” http://fakelibrary.org/book

At this point, the resource on the server might validate the results, create the data records associated with the book and return a 201 response code indicating a new resource has been created. The URL for the new resource can be discovered in the Location header of the response.

An important aspect of a RESTful request is that each request contains enough state to answer the request. This allows for the conditions of visibility and statelessness on the server, desirable properties for scaling systems up and identifying what requests are being made. This helps enable caching of specific results. The combination of a server’s address and the state of the request combine to form a computational hash key into a result set:


http://fakelibrary.org + /book/isbn/978-0596801687

Because of the nature of the GET request (discussed later), this allows a client to make very specific requests, but only if necessary. The client can cache a result locally, the server can cache it remotely or some intermediate architectural element can cache it in the middle. This is an application-independent property that can be designed into our systems.

Just because it is possible to manipulate a resource does not mean everyone will be able to do so. We can absolutely put a protection model in place that requires users to authenticate and prove that they are allowed to do something before we allow them to. We will have some pointers to ways of securing RESTful services at the end of this card.

WHAT ABOUT SOAP?

What about it? There is a false equivalence asserted about REST and SOAP that yields more heat than light when they are compared. They are not the same thing. They are not intended to do the same thing even though you can solve many architectural problems with either approach.

The confusion largely stems from the mistaken idea that REST “is about invoking Web services through URLs”. That has about as much truth to it as the idea that “agile methodologies are about avoiding documentation.” Without a deeper understanding of the larger goals of an approach, it is easy to lose the intent of the practices.

REST is best used to manage systems by decoupling the information that is produced and consumed from the technologies that do so. We can achieve the architectural properties of:

  • Performance
  • Scalability
  • Generality
  • Simplicity
  • Modifiability
  • Extensibility

This is not to say SOAP-based systems cannot be built demonstrating some of these properties. SOAP is best leveraged when the lifecycle of a request cannot be maintained in the scope of a single transaction because of technological, organizational or procedural complications.

RICHARDSON MATURITY MODEL

In part to help elucidate the differences between SOAP and REST and to provide a framework for classifying the different kinds of systems many people were inappropriately calling “REST”, Leonard Richardson introduced a Maturity Model. You can think of the classifications as a measure of how closely a system embraces the different pieces of Web Technology: Information resources, HTTP as an application protocol and hypermedia as the medium of control.

Calling it a “maturity model” might seem to suggest that you should only build systems at the most “mature” level. That should not be the take-home message. There is value at being at Level 2 and the shift to Level 3 is often simply the adoption of a new MIME type. The shift from Level 0 to Level 3 is much harder, so even incremental adoption adds value.

Start by identifying the information resources you would like to expose. Adopt HTTP as an application protocol for manipulating these information resources including support for content negotiation. Then, when you are ready to, adopt hypermedia-based MIME types and you should get the full benefits of REST.

VERBS

The limited number of verbs in RESTful systems confuses and frustrates people new to the approach. What seem like arbitrary and unnecessary constraints are actually intended to encourage predictable behavior in non-application-specific ways. By explicitly and clearly defining the behavior of the verbs, clients can be self-empowered to make decisions in the face of network interruptions and failure.

There are four main HTTP verbs (sometimes called methods) used by well-designed RESTful systems.

GET

The most common verb on the Web, a GET request transfers representations of named resources from a server to a client. The client does not necessarily know anything about the resource it is requesting. What it gets back is a bytestream tagged with metadata that indicates how the client should interpret it. On the Web, this is typically “text/html” or “application/xhtml+xml”. As we indicated above, using content negotiation, the client can be proactive about what is requested as long as the server supports it.

One of the key points about the GET request is that it should not modify anything on the server side. It is fundamentally a saferequest. This is one of the biggest mistakes made by people new to REST. With RMM Level 1 systems, you often see URLs such as: http://someserver/res/action=update?data=1234

Do not do this! Not only will RESTafarians mock you, but you will not build RESTful ecosystems that yield the desired properties. The safety of a GET request allows it to be cached.

GET requests are also intended to be idempotent. This means that issuing a request more than once will have no consequences. This is an important property in a distributed, network-based infrastructure. If a client is interrupted while it is making a GET request, it should be empowered to issue it again because of this property. This is an enormously important point. In a well-designed infrastructure, it does not matter what the client is requesting from which application. There will always be application-specific behavior, but the more we can push into non-application-specific behavior, the more resilient and easier to maintain our systems will be.

Do not do this! Not only will RESTafarians mock you, but you will not build RESTful ecosystems that yield the desired properties. The safety of a GET request allows it to be cached.

GET requests are also intended to be idempotent. This means that issuing a request more than once will have no consequences. This is an important property in a distributed, network-based infrastructure. If a client is interrupted while it is making a GET request, it should be empowered to issue it again because of this property. This is an enormously important point. In a well-designed infrastructure, it does not matter what the client is requesting from which application. There will always be application-specific behavior, but the more we can push into non-application-specific behavior, the more resilient and easier to maintain our systems will be.

POST

The situation gets a little less clear when we consider the intent of the POST and PUT verbs. Based on their definitions, both seem to be used to create or update a resource from the client to the server. They have distinct purposes, however.

POST is used when the client cannot predict the identity of the resource it is requesting to be created. When we hire people, place orders, submit forms, etc., we cannot predict how the server will name these resources we are creating. This is why we POST a representation of the resource to a handler (e.g. servlet). The server will accept the input, validate it, verify the user’s credentials, etc. Upon successful processing, the server will return a 201 HTTP response code with a “Location” header indicating the location of the newly created resource.

Note: Some people treat POST like a conversational GET on creation requests. Instead of returning a 201, they return a 200 with the body of the resource created. This seems like a shortcut to avoid a second request, but it also conflates POST and GET and complicates the potential for caching the resource. Try to avoid the urge to take shortcuts at the expense of the larger picture. It seems worth it in the short-term, but over time, these shortcuts will add up and will likely work against you.

Another major use of the POST verb is to “append” a resource. This is an incremental edit or a partial update, not a full resource submission. For that, use the PUT operation. A POST update to a known resource would be used for something like adding a new shipping address to an order or updating the quantity of an item in a cart.

Because of this partial update potential, POST is neither safe nor idempotent.

A final common use of POST is to submit queries. Either a representation of a query or URL-encoded form values are submitted to a service to interpret the query. It is usually fair to return results directly from this kind of a POST since there is no identity associated with the query.

Note: Consider turning a query like this into an information resource itself. If you POST the definition into a query information space, you can then issue GET requests to it, which can be cached. You can also share this link with others.

PUT

Many developers largely ignore the PUT verb because HTML forms do not support it. It serves an important purpose, however and is part of the full vision for RESTful systems.

When a client has a URL reference to an existing resource and wishes to update it, PUTing a representation to the URL serves as an overwrite action. This distinction allows a PUT request to be idempotent in a way that POST updates are not.

If a client is in the process of issuing a PUT overwrite and it is interrupted, it can feel empowered to issue it again because an overwrite action can be reissued with no consequences; the client is attempting to control the state, so it can simply reissue the command.

Note: This protocol-level handling does not necessarily preclude the need for higher (application-level) transactional handling, but again, it is an architecturally desirable property to bake in below the application level.

A PUT can also be used to create a resource if the client is able to predict the resource’s identity. This is usually not the case as we discussed under the POST section, but if the client is in control of the server side information spaces, it is a reasonable thing to allow. Publishing into a user’s weblog space is a typical example of PUTing to a user-specified name.

DELETE

The DELETE verb does not find wide use on the public Web (thankfully!), but for information spaces you control, it is a useful part of a resource’s lifecycle.

DELETE requests are intended to be idempotent, so you should generally build resources that respond to DELETE requests by failing silently and returning a 200 even if it has already been deleted. This may require extra state management on the server to differentiate between DELETE requests to things that no longer exist, versus requests to things that never existed.

Some security policies may require you to return a 404 for nonexistent or deleted resources so DELETE requests do not leak information about the presence of resources.

There are three other verbs that are not as widely used but provide value.

HEAD

The HEAD verb is used to issue a request for a resource without actually retrieving it. It is a way for a client to check for the existence of a resource and possibly discover metadata about it.

OPTIONS

The OPTIONS verb is also used to interrogate a server about a resource by asking what other verbs are applicable to the resource.

PATCH

The newest of the verbs, PATCH was only officially adopted as part of HTTP in early 2010. The goal is to provide a standardized way to express partial updates. The POST method is basically unconstrained so it tends to defy constraint.

A PATCH request in a standard format could allow an interaction to be more explicit about the intent. There are currently no standardized patch formats in wide RESTful use, but they are likely to be designed for XML, HTML plain text and other common formats.

RESPONSE CODES

HTTP response codes give us a rich dialogue between clients and servers about the status of a request. Most people are only familiar with 200, 403, 404 and maybe 500 in a general sense, but there are many more useful codes to use. The tables presented here are not comprehensive, but cover many of the most important codes you should consider using in a RESTful environment.

The first collection of response codes indicates that the client request was well formed and processed. The specific action taken is indicated by one of the following.

Code Description
200 OK. The request has successfully executed. Response depends upon the verb invoked.
201 Created. The request has successfully executed and a new resource has been created in the process. The response body is either empty or contains a representation revealing URIs for the resource created. The Location header in the response should point to the new URI as well.
202 Accepted. The request was valid and has been accepted but has not yet been processed. The response should include a URI to poll for status updates on the request. This allows asynchronous REST requests.
204 No Content. The request was successfully processed but the server did not have any response. The client should not update its display.

Table 1: Successful Client Requests

The second collection of response codes indicates that the client should look elsewhere for the resource or information about it due to movement or some other situation.

Code Description
301 Moved Permanently. The requested resource is no longer located at the specified URL. The new Location should be returned in the response header. Only GET or HEAD requests should redirect to the new location. The client should update its bookmark if possible.
302 Found. The requested resource has temporarily been found somewhere else. The temporary Location should be returned in the response header. Only GET or HEAD requests should redirect to the new location. The client need not update its bookmark as the resource may return to this URL.
303 See Other. This response code has been reinterpreted by the W3C Technical Architecture Group (TAG) as a way of responding to a valid request for a non-network addressable resource. This is an important concept in the Semantic Web when we give URIs to people, concepts, organizations, etc. There is a distinction between resources that can be found on the Web and those that cannot. Clients can tell this difference if the get a 303 instead of 200. The redirected Location will be reflected in the Location header in the response. It will contain a reference to a document about the resource or perhaps some metadata about it. This is not a universally popular decision but is currently the provided guidance.

Table 2: Redirected Client Requests

The third collection of response codes indicates that the client request was somehow invalid and will not be handled successfully if reissued in the same condition. These failures include potentially improperly formatted requests, unauthorized requests, requests for resources that do not exist, etc.

Code Description
400 Bad Request. Generally the sign of a malformed or otherwise invalid request.
401 Unauthorized. Without further authorization credentials, the client is not allowed to issue the request. The inclusion of an “Authorization” header with valid credentials might still succeed.
403 Forbidden. The server is disallowing the request. Extra credentials will not help.
404 Not Found. The server could not match the request to a known resource.
405 Method Not Allowed. The requested method (verb) is not allowed for that resource. Response will indicate in an “Allow” header what is allowed.
406 Not Acceptable. The server cannot generate a representation compatible with what was asked for in the request “Accept” header.
410 Gone. The resource is explicitly no longer available and will not be in the future.
411 Length Required. The server requires the client to specify a “Content-Length” header indicating the size of the request. A resubmit with this header might succeed.
413 Entity Too Large. The request entity is too large for the server to process.
415 Unsupported Media Type. The client submitted a media type that is incompatible for the specified resource.

Table 3: Invalid Client Requests

The final collection of response codes indicates that the server was temporarily unable to handle the client request (which may still be invalid) and that it should reissue the command at some point in the future.

Code Description
500 Internal Service Error. A catchall for server processing problems.
503 Service Unavailable. A temporary response in the face of too many requests. The client may attempt to retry the request again in the future at a time specified in a “Retry-After” header..

Table 4: Server Failed to Handle the Request

REST RESOURCES

Thesis

Dr. Fielding’s thesis, “Architectural Styles and the Design of Network-based Software Architectures” is the main introduction to the ideas discussed here: http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm.

RFCs

The specifications for the technologies that define the most common uses of REST are driven by the Internet Engineering Task Force (IETF) Request for Comments (RFC) process. Specifications are given numbers and updated occasionally over time with new versions that obsolete existing ones. At the moment, here are the latest relevant RFCs.

URI

The generic syntax of URIs as a naming scheme is covered in RFC 3986. These can include encoding other naming schemes such as website addresses, namespace-aware sub-schemes, etc. Site: http://www.ietf.org/rfc/rfc3986.txt

URL

A Uniform Resource Locator (URL) is a form of URI that has sufficient information embedded within it (access scheme and address usually) to resolve and locate the resource. Site:http://www.ietf.org/rfc/rfc1738.txt

IRI

An Internationalized Resource Identifier (IRI) is conceptually a URI encoded in Unicode to support characters from the languages of the world. The IETF chose to create a new standard rather than change the URI scheme itself to avoid breaking existing systems and to draw explicit distinctions between the two approaches. Supporting IRIs becomes a deliberate act. There are mapping schemes defined for converting between IRIs and URIs as well. Site: http://www.ietf.org/rfc/rfc3987.txt

HTTP

The Hypertext Transfer Protocol (HTTP) version 1.1 defines an application protocol for manipulating information resources generally represented in hypermedia formats. While it is an application-level protocol, it is generally not application specific and important architectural benefits emerge as a result. Most people think of it and the Hypertext Markup Language (HTML) as “The Web”, but HTTP is useful in the development of non-document-oriented systems as well.

Site:http://www.ietf.org/rfc/rfc2616.txt

Implementations

There are several libraries and frameworks available for building systems that produce and consume RESTful systems. While any Web server can be configured to supply a REST API, these frameworks, libraries and environments make it easier to do so.

Here is an overview of some of the main environments:

JSR-311 (Jersey)

This was an attempt to add REST to the J2EE environment. The original focus was on server side issues, but a client API has emerged.

The basic idea is that classes (either POJOs or specific resource classes) are annotated to indicate how they should participate in a RESTful environment. These classes can be deployed into any system that knows how to parse the annotated classes.
Site: http://wikis.sun.com/display/Jersey/Main
Samples: http://blogs.sun.com/sandoz/entry/jersey_samples

Restlet

The Restlet API was one of the first attempts at creating a Java API for producing and consuming RESTful systems. The attention paid to both the client and server sides of the equation yields some very clean and powerful APIs.

Additionally, Restlet-based systems can easily be deployed into various containers including typical servlet-based containers, Grizzly (https://grizzly.dev.java.net), the Simple Framework (http://simpleweb.sourceforge.net), etc.

Restlet supports JSR-311 annotations and provides RESTful connections to many data types, sources and systems.
Site: http://restlet.org

NetKernel

One of the more interesting RESTful systems, NetKernel represents a microkernel-based environment supporting a wide variety of architectural styles. It benefits from the adoption of the economic properties of the Web in software architecture. You can think of it as “bringing REST inside”. Whereas any REST-based system kind of looks the same externally, NetKernel continues to look like that within its execution environment as well.

Internally, components are loosely coupled through URI-based invocations in similar ways to how documents are linked on the Web. This yields important architectural properties of flexibility and scalability.

NetKernel makes it very easy to work with a variety of data types, services and sources in a resource-oriented and powerful way.
Site: http://netkernel.org

Sinatra

Sinatra is a domain specific language (DSL) for creating RESTful applications in Ruby.
Site: http://www.sinatrarb.com

Compojure-REST

A thin layer on top of Compojure (a Clojure-based Web framework) for building RESTful APIs.

Site: https://github.com/ordnungswidrig/compojure-rest
Compojure Site: https://github.com/weavejester/compojure/wiki

OpenRasta

OpenRasta brings the concept of REST to the .NET platform in ways that allow it to be deployed alongside ASP.NET and WCF components.

Site:http://trac.caffeine-it.com/openrasta/wiki/Doc

There are many other implementations to investigate. For more information, please consult this list of known implementations: http://code.google.com/p/implementing-rest/wiki/RESTFrameworks

Books

“RESTful Web Services” by Leonard Richardson and Sam Ruby, 2007. O’Reilly Media.
“RESTful Web Services Cookbook” by SubbuAllamaraju, 2010. O’Reilly Media.
“REST in Practice” by Jim Webber, SavasParastatidis and Ian Robinson, 2010. O’Reilly Media.
“Restlet in Action” by Jerome Louvel and Thierry Boileau, 2011. Manning Publications.
“Resource-Oriented Architectures : Building Webs of Data” by Brian Sletten, 2011. Addison-Wesley.

Websites

REST Wiki:

Site: http://rest.blueoxen.net

This Week in REST:

Site: http://thisweekinrest.wordpress.com

Mailing Lists

Rest-discuss: One of the most active and opinionated mailing lists for discussion of REST topics. Many of the most influential minds in the field congregate here to discuss both fundamental and esoteric nuances of the architectural style. This is best used as a read-only learning resource until you have mastered the basics and need illumination on finer points. Consider searching the archives before asking introductory questions.

Site: http://tech.groups.yahoo.com/group/rest-discuss/

About The Authors

Brian Sletten

Brian Sletten

Brian Sletten is a liberal arts-educated software engineer with a focus on forward-leaning technologies. He has a background as a system architect, a developer, a mentor and a trainer. His experience has spanned the online games, defense, finance and commercial domains with security consulting, network matrix switch controls, 3D simulation/visualization, Grid Computing, P2P and Semantic Web-based systems. He has a B.S. in Computer Science from the College of William and Mary. He is President of Bosatsu Consulting, Inc. and lives in Los Angeles, CA.

Recommended Book

RESTful web services

This cookbook includes more than 100 recipes to help you take advantage of REST, HTTP, and the infrastructure of the Web. You’ll learn ways to design RESTful web services for client and server applications that meet performance, scalability, reliability, and security goals, no matter what programming language and development framework you use.


Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

JBoss Enterprise Application Platform 5

Download your free copy of Refcard #97: Getting Started with JBoss Enterprise Application Platform 5 now!

0 replies - 28645 views - 05/03/10 by Lyndsey Clevesy in Announcements

SOA Governance

By Todd Biske

12,269 Downloads · Refcard 41 of 151 (see them all)

Download
FREE PDF


The Essential SOA Governance Cheat Sheet

Service-Oriented Architecture (SOA) governance is the combination of people, policies, and processes that an organization leverages to achieve the desired behavior in their SOA adoption efforts. SOA needs a solid foundation that is based on standards and includes policies, contracts, and service level agreements. SOA Governance must enable an organization to answer two fundamental questions: What are the right services to build? How do I build services the right way? This DZone Refcard provides Enterprise Architects, senior IT managers, and SOA program managers with an overview of the considerations, approaches, and technologies necessary for successful SOA governance.
HTML Preview
SOA Governance

SOA Governance

By Todd Biske

ABOUT SOA Governance

SOA governance is the combination of people, policies, and processes that an organization leverages to achieve the desired behavior in their SOA adoption efforts. This refcard provides Enterprise Architects, senior IT managers, and SOA program managers with an overview of the considerations, approaches, and technologies necessary for successful SOA governance.

The hardest part of SOA adoption is the cultural change required. This can include changes not only in the technologies utilized in building solutions, but also the processes by which solutions are initially defined, funded, resourced, managed in production, and changed per the needs of the business. SOA governance must enable an organization to answer two fundamental questions. What are the right services to build? How do I build services the right way?

Beyond these fundamental questions, SOA governance must go deeper to ensure that the desired behaviors of the adoption effort are achieved. This is especially critical, because the typical desired behaviors, such as increased agility and reduction or elimination of redundancy, will not be achieved through a single project or program, but over the successful execution of many projects, and sustained through the projects that follow. It is governance that must provide the policies that guide how those projects are defined, executed, and maintained so that goals are met.

People

Governance must begin with, but not end with, people. People must take the desired outcome of the SOA effort, determine what the desired behaviors are, and define the policies that comprise that behavior. These people must be viewed as authorities in the organization. Without authority, the organization will not respect the policies that have been established. At the same time, the selection of the people that will be associated with your governance effort and how they interact with the people executing the projects is the area of most risk in implementing SOA governance. From the people chosen, to the method of interaction with the project teams, to the decision making culture of the organization, the people can be the difference between effective and ineffective governance.

Determining your Organizational approach

When determining the people that will be part of your SOA governance effort, you must determine how those people will be organized. The challenge you will face is that an SOA effort will touch all areas of your IT organization, and perhaps even outside of IT. It is likely that you have teams in your organization today that are involved with some aspects of governance, such as Enterprise Architecture, but it is unlikely that a single team has domain over all aspects of the SOA effort. Are your architects setting policies that determine what projects get funded? Are your managers defining policies that guide how solutions get built? The SOA effort will challenge the existing organizational structure and create dependencies across teams where they did not exist before.

Hot Tip

The most important thing is to have your governance efforts match the culture of the organization, presuming that your organization is already effective in its decision making and policy setting processes. If it is not, then perhaps a change is in order.

Command-and-Control Organizations

Organizations that operate in a manner that always looks up the organization chart for policy definition and decisionmaking are typically known as command-and-control organizations. While the rigidity of the processes and recognition of the chain of command may be a turn off for many, this does not mean that the organization cannot be effective with this approach. Sometimes it is only way to get things back in line if the current processes are very chaotic and ineffective. In such an organization, having the SOA governance team show up on the organization chart either as a new entity or as a collection of dotted lines that report in to the highest levels within IT may be required to ensure that the authority of the team is recognized.

Determining your Organizational Approach, continued

Consensus-Driven Organizations

At the opposite extreme are organizations whose decision making process is completely independent from the organizational structure. These consensus-driven organizations focus on bringing the right people together for the question at hand, and allow them to reach the appropriate decision. Their effectiveness is based on how quickly the people involved can reach consensus. The risk with this approach is that decisions are not made in a timely manner, because consensus is very difficult to reach. If your organization is an effective consensus-driven organization, a formal SOA governance team may not be necessary. Rather, the right people will naturally come together to establish policy because that’s how the organization operates.

SOA Center of Excellence

In all likelihood, your organization will fall somewhere in the middle. The commonality across both approaches is that the SOA governance effort must recognize that SOA will touch many, if not all areas of your IT organization, and will likely cross the existing organizational boundaries. By drawing people from multiple areas into the effort, your chances of missing some piece of the SOA governance puzzle are lessened. This collection of people will form your SOA Center of Excellence. You can choose to formally represent that within your organization chart, if necessary, or it can simply be another one of many “virtual teams” that a consensus-driven organization naturally leverages to ensure success.

Center Excellence

Roles

Because of the domains of policies that will be necessary for your SOA efforts, there are many roles in the organization that will need to be involved with the governance effort.

Role Description
Enterprise
Architect
Responsible for establishing technical policies, expressed in the form
of reference architectures, that define how services will be built and infrastructure leveraged.
Solution Architect Responsible for the technical leadership and decisions on a project or program, and for the project’s compliance with the technical designtime
policies.
Information
Architect
Responsible for ensuring the consistent representation of information
on service messages.
Technical Lead/
Domain Architect
Responsible for technical policies within a smaller domain, expressed
through domain-specific reference architectures.
Business Analyst Responsible for the functional aspects and business analysis activities of the solution, including defining what services get built.
Security Architect Responsible for ensuring that the technology solutions of the company take appropriate measures to protect its corporate assets and
intellectual property.
Platform Manager Responsible for the technology platform where services (and consumers, for internal consumers) are hosted, and the entry criteria for
deployment.
Service Manager Responsible for managing the relationships with service consumers>br>via service contracts, managing the service lifecycle, and ensuring that proper run-time governance is in place.
IT Manager Responsible for managing personnel, work, and budget.

Policies

Establishing a center of excellence is not enough to ensure success with your SOA governance and SOA adoption efforts. While governance begins with people, it cannot end there. If all decisions must be made and/or reviewed by the center of excellence, your efforts will not scale as the COE will become a bottleneck for all of your projects.

In order to allow your governance efforts to scale, the people must focus first on establishing policies. Policies are the standards and guidelines that enable the staff executing your SOA projects to make appropriate decisions and achieve the desired behaviors. There are three key timeframes for which policies are necessary:

Pre-Project
Governance
In this timeframe, the primary concern is with determining what IT projects
to fund and execute, which is frequently associated with the broader
subject of IT governance. While SOA governance should not introduce new
governance processes associated with deciding what projects to fund and
execute, policies associated with SOA governance should be included in the
criteria. This is the timeframe where the policies must aid in answering the
question, “What are the right services to build?”
Project
Governance
This is also referred to as design-time governance; however, the activities
in this time frame are concerned with much more than solution design. The
policies at this stage must aid in answering the question, “How do I build my
services the right way?”
Run-Time
Governance
Governance does not end when the projects are complete. SOA adoption
can increase the number of moving parts in any given solution, and if
the dependencies aren’t managed appropriately chaos can ensue. You
need to have appropriate run-time policies to ensure the systems behave
appropriately while in use and that the relationships between service
consumers and service providers are managed.

Pre-Project Governance

There are key artifacts that can assist in defining policies at this timeframe.

Timeframe

Pre-Project Governance, continued

Organization Charts Can influence how funds are distributed in an organization,
and this can be a barrier to SOA adoption. The organization
chart must be leveraged and modified where necessary to
account for clear service ownership and funding approaches. if
the organizational issues aren’t discussed at the time projects
are defined and funded, it can severly hamper the efforts of
the project team.
Business Process Models,
Business Domain/
Capability Models and
Application Portfolio
All are closely related. These are analysis artifacts that should
be used to guide the decisions on what services should
be created. Business process models and the application
portfolio provide excellent context on existing applications
and processes, but each on their own runs the risk of focusing
on application silos or process silos. By combining these
two with a capability map against the business domains, in
essence a heat map of business capabilities, the areas where
shared services will provide the most value can more easily be
identified. It is important to do this outside of the context of
Business Process Models,
Business Domain/Capability Models and Application Portfolio,
continued
any project, as within a project, there is significant pressure
to constrain analysis efforts and avoid “analysis paralysis.” A
good reference for this approach is the book “Enterprise SOA
Adoption Strategies” by Steve Jones of CapGemini.
Service Porfolio The last artifact is a catalog of services that have been built
and are available in production, but it is much more powerful
when it is used as a planning tool. When the organization
has taken the time to perform business process analysis and
business domain/capability analysis, an outcome should be
the definition of key services that the organization needs to
create to fully leverage SOA.

Policies for Pre-Project Governance
The following are questions/policies that you should consider in your pre-project governance efforts:

Pre-project governance
Has the proposed project identified candidate services?
Has the proposed project mapped candidate services to the business domains as represented in the business domain/capability models?
Has the proposed project reviewed the service portfolio against the list of candidate services?
Has an appropriate team of project stakeholders been identified based upon candidate services?
Has the proposed program/project been appropriately structured and scheduled to properly manage the development and integration of new and existing services?
Have all funding details been determined based upon the services proposed and the organizations involved?
Does the roadmap include the development of services with high potential for reuse?
Are projects encouraged to reuse existing services, where appropriate, based upon the business domain models and business objectives?
Are projects allowed to create redundancies, where appropriate, based upon the business domain models and business objectives?
Have existing systems been taken into account in the definition of the proposed services?
Is the organizational structure being reviewed on a regular basis based upon continued service analysis?
Does the organization have a clear approach to resolving service ownership models?
Are business processes properly leveraging services?
Does your service portfolio properly account for any globalization impact?
Does the service portfolio properly account for any planned areas for growth by acquisition?

Project Governance

In order to perform project governance, the following artifacts are recommended as sources of policy:

Artifacts

Service technology reference architectureHelps ensure that appropriate technologies are used for the service being developed. It should first define the appropriateservice types for the organization and then map those types to specific service technologies. Service types can include:

Composite
Services
These are services that are built by combining the output of two or
more services and aggregating the respective responses into a single
response.
Automated
(Orchestrated)
Processes
These are services that are built by executing a fully automated
sequence of actions as represented in a graphical process model.
Integration
Services
These are services whose purpose is to enable a system that does not
support service standards to speak with service consumers.
Presentation
Services
These are services that provide information in a presentation-friendly
format, providing information in such a way that it is easily consumed by
user interface technologies.

Service Technology Reference Architechture, continued

Management
Services
These are services that expose management and administrative
functionality. There are technologies, such as SNMP, JMX, and WSManagement
that have been tailored for this purpose.
Information
Services
These are services that are used to retrieve information from a variety
of data sources, aggregating the results into a single response. These
services differ from composite services in that they are specifically
designed to talk only to data sources on the back end, rather than any
arbitrary web service.
Content
Subscription
Services
These are services that provide content feeds, typically adhering to feed
syndication standards such as RSS and ATOM.
General Business
Services
This is a catch-all category for any service that doesn’t fit into any of the
other categories.

These service types are mapped to service technologies, defining both the service platform as well as the servicecommunication technologies. The service platformdefines where the service implementation will be hosted and executed, such as a Java Application Server or a BPEL orchestration platform, while the service communication technology defines the message formats and thecommunication technologies used to interact with the service, such as XML, SOAP, and HTTP. The mapping effort links service types to technologies.

The service technology reference architecture must also rovide policies on how the non-functional capabilities associated with services interactions will be provided by the underlying infrastructure. This includes:

  • Security
  • Routing and load balancing
  • Transport and mediation
  • High availability and failover
  • Monitoring and management
  • Versioning and transformations

These non-functional capabilities should be enforced through policy-driven infrastructure that is configured, rather than coded.

Hot Tip

These non-functional capabilities are a key aspect of SOA, because they are the foundation of run-time governance. At the same time, they must be factored into the design-time decisions, because if the development teams don’t utilize the technology appropriately, the ability to enforce run-time governance policies will disappear.

Service security reference architecture The next artifact is the service security reference architecture. This can be included as a subset of the Service Technology Reference Architecture, or created as a standalone artifact. Regardless of the approach, there are two questions that must be answered by the reference architecture. What security policies must be enforced? What technologies are used for enforcing those policies?

Reference Architecture

Service Security Reference Architecture, continued

These questions must guide the developers of services and their consumers on security policies for authentication, authorization, encryption, digital signatures, and threat prevention. This includes how identity is represented on messages, how it flows through a chain of service invocations, how and where authorization decisions are made, what type of data should be encrypted and where it should be encrypted, how and where incoming and outgoing messages are checked for potentially malicious content, and more.

Service blueprints and frameworks
Service blueprints are examples of common patterns that demonstrate a policy-compliant way of solving a particular problem. Their use can make the collection of policies associated with a reference architecture must less daunting to a project team. Common patterns may include integration scenarios with legacy systems, exposing services to external parties, and consuming services provided by external parties.

Service frameworks are reusable libraries that, when used, allow implementations to be compliant with the policies of the organization, such as using the correct security credentials on service messages. Compliance is easiest when it is the path of least resistance, and making it so a developer only needs to write one or two lines of code, or even none, can make that happen.

Standard information models and schemas
Standard information models and schemas do not present one universal representation that everyone agrees on, because it probably doesn’t exist. Rather, they ensure consistency in the way information is represented, minimizing the number of representations. Industry verticals, such as SWIFT (financial services), HIPAA (healthcare), and ACORD (insurance), can be leveraged as starting points, and are likely required when exposing services externally.

Policies for project governance
The following are questions/policies that you should consider in your project governance efforts, in addition to all those that were specified in the pre-project governance if not enforced at that time:

Policies for project governance
Have all services been mapped to an appropriate type?
Are the service technologies chosen for each service consistent with the type to technology mapping specified in the reference architecture?
Does the service use the standard communication technologies specified in the reference architecture?
Does the service interface comply with all naming conventions for URLs as specified in the reference architecture?
Does the service interface properly reference all external schema definitions, rather than copying them locally?
Does the service interface use the standard schema definitions properly?
Do external facing services only expose industry standard schemas, where they exist?
Is the service interface compliant with industry standards, such as WS-I?
Does the service require identity on its messages?
Are all service consumers properly specifying identity on outgoing requests?
Have appropriate authorization policies been established for the service?
Is the service communication infrastructure being leveraged appropriately?
Are all internal consumers properly leveraging the standard service frameworks?
Is all sensitive information properly encrypted according to the service security policies?

Policies for Project Governance, continued

Have service contracts been established between all consumers and providers?
Are all aspects of the service contract fully specified including message schemas, versions,
delivery schedule, points of contact, and expected usage rates?
Have all services been thoroughly and adequately tested, with testing results available to service consumers, if required by the service contract? For internal consumers, testing results
should always be available to help counter the natural tendency for developers to resist using things they didn’t personally write.
Have service managers been assigned for all new services?
Are the service boundaries identified in the solution consistent with the business domain models?
Has the solution incorporated existing services appropriately?
Has the solution properly published information about new services into the Service Registry/ Repository?
Has the solution avoided creating redundant services where not appropriate according to the business domain models?

Run-time Governance

During this timeframe, the major concern is the correct behavior of service consumers and service providers so that the infrastructure remains operational and in a healthy state at all times. At its core, the run-time infrastructure consists of three things: infrastructure used to execute the logic associated with the service consumer, infrastructure used to execute the logic associated with the service provider, and infrastructure used to allow communication between the two. Three core principles that should be adopted:

Service Consumer Responsible for ensuring that all messages they send are compliant with the service communication standards.
Service Providers Responsible for ensuring that they expose endpoints that can consume messages that are compliant with the service
communication standards.
Service Communication
Infrastructure
Enforces all non-functional capabilities for all messages that are compliant with the service communication standards,
including mediation between those standards.
Reference Architecture

If a consumer or provider is not capable of being compliant with the service communication standards, adapters are leveraged not in the middle (the service communications infrastructure), but rather on the endpoints where the service consumers and service providers are deployed.

Service Contract The service contract is the collection of policies that govern the interaction between a service consumer and a service provider, much as a legal contract is used to govern the relationship between two parties. This includes the messaging schemas, such as those defined in a WSDL file for a Web Service, but must also include the policies that govern the run-time behavior, such as the expected usage by the consumer in an appropriate level of detail, as well as the expected response time from the provider when the system is

Service Contract, continued

behaving as expected. It is the responsibility of the run-time infrastructure to enforce the policies of the service contract.

Policies for Run-time Governance

The following are questions/policies that you should consider in your run-time governance efforts:

Policies for run-time governance
What is the normal rate of requests for a given service consumer?
What is the expected response time for the service provider for typical requests from that service consumer?
What actions are taken when the request rate for a given service consumer exceeds each of the agreed upon thresholds?
What actions are taken when the response time for a given service consumer exceeds each of the agreed upon thresholds?
Are there any time restrictions on when a particular consumer can access a service?
For services with multiple entry points via different technologies (e.g. SOAP/HTTP, XML/HTTP, SOAP/JMS), is policy enforcement defined and consistent (if needed) for each entry point?
Are all security policies configured and being enforced?
Are service requests routed to the appropriate version for each consumer, or have appropriate transformations applied preserving backward compatibility?
Are all service messages being logged appropriately per any enterprise auditing requirements?
Are all service messages being logged and preserved for the purpose of debugging?
Are usage metrics being properly collected?
Are usage reports being generated and distributed appropriately?
Are the recipients of these reports properly reviewing them and accounting for any discrepancies in behavior?
Are all policies associated with message structure being enforced by the run-time infrastructure?
Are non-compliant messages being logged, rejected, and reported to appropriate personnel?

The four processes of governance

There are four key processes that must be executed as part of your governance processes: policy definition, communication and education, enforcement, and measurement and feedback.

Policy definition Concerned with establishing the policies that the governance team feels will result in the desired behavior if they are followed. Without policy, the rest of the organization must either guess what the correct decisions are to get to the desired outcome, or involve someone from the governance team on every single project. The first option is unlikely to lead to success, and the second option has both scalability issues as well as being prone to variation based upon the “tribal knowledge” of the particular person from the governance team involved. Defining and documenting the policies is step one toward gaining consistency in the outcome.
Education and
Comunication
Just because the governance team has reached agreement and documented the policies doesn’t mean they’re going to be followed,
or even known for that matter. A formal, planned communication effort to educate the organization on why you’re adopting SOA, the desired
behavior you hope to achieve, and the policies that are being put in place to achieve them is required. It’s not a one time presentation to all
of IT, but rather a series of targeted communications for the various roles in the organization, large group presentations, small team presentations,
blogs, wikis, and appropriate surveys and followups to ensure that the communication is effective.
Enforcement Even if your communication efforts are incredibly successful, you still need to put processes in place to ensure the policies are being
followed. What you will find, however, is the better job you can do on communication and education, the easier your enforcement processes
can be. If education is poor, enforcement will likely need to be more heavy-handed. Where possible, automated testing and reporting can
certainly make the processes more efficient and cost-effective.
Measurment and
Feedback
The governance group must have measurement and feedback processes to ensure that progress is being made toward the desired
behavior. If the desired behavior is not reached, something needs to be changed, and it could easily be the policies, the processes, or the
people involved with governance. Accountability is lost if the team puts policies and processes in place, but then does nothing to verify that all
that effort actually paid off.
Processes

SOA Governance Technologies

The role of SOA governance technology is not to be your governance, but rather support your governance processes by making them more efficient. To do this, there are two areas where technology can be part of your SOA governance effort.

Registry/Repository

This is the first area in support of service metadata and policy management. The registry/repository allows you to track services, service consumers, and the policies that govern their interactions. It can be the critical tool in performing service portfolio management and service lifecycle management. Through domain modeling, process analysis, and other techniques, the portfolio can be populated with planned services, updated as they are implemented, and decommissioned when all consumers disappear, with the registry/repository being the tracking point.

In addition to service metadata, the registry/repository can also be used for policy management. Policies also have a lifecycle associated with them that must be managed. If the teams that are expected to adhere to policies aren’t aware the policies exist, your governance efforts will be sub-optimal.

Policy Enforcement

The second area is policy enforcement where multiple technologies can be leveraged.

Design Time

production.
Technologies Description
Service Testing Can evaluate service interfaces for compliance with enterprise policies
(some registry/repository solutions can also do this at time of registration). A key factor
to consider is integration with other designtime testing solutions, whether for functional testing, regression
testing, load testing, or performance testing. These same test cases can also be leveraged at run-time for
active monitoring of services in production.
Service Framework Shared libraries that a developer leverages when constructing service
consumers or service providers. These frameworks can make policy compliance automatic, and thus the path of least resistance.

Run-Time

At run-time, there are three types of technologies that can be leveraged for enforcement of the run-time contract between a service consumer and service provider.

Technologies Description
Enterprise Service Bus (ESB) enabling standards-based connectivity between consumers and
providers, typically with stronger appeal to developers.
XML Appliance These appliances tend to excel in XML security and threat
protection, but may not have the flexibility of some ESB solutions.
Service
Management
Platforms
The focus here is on the instrumentation and analysis of service
interactions, allowing alerts to be issued, requests to be throttled
or prioritized, and reports generated. There is overlap with the ESB
space, as many management platforms also provide gateways with
similar capabilities, but the platforms typically have agents that can
integrate nicely with your existing infrastructure, including ESBs,
appliances, and hosting platforms.
Key Pitfalls for SOA Governance Efforts
Lack of
Communication and
Education
A common approach to governance is to assign some senior people
to a review board and mandate that all projects go through a review
as part of the development process. The problem with this is that the
people involved frequently don’t define the policies that teams are
required to follow, and even if they do, they aren’t communicated. As
a result, the project teams are left to guess what the expectations are,
and the reviews are likely unsuccessful.

Service Contract, continued

No Formal
Contracts
The interaction between a consumer and a provider is more than just
making sure the messages are compatible. If multiple consumers
are involved, the run-time interactions must be carefully managed to
ensure that one badly behaved consumer (or provider), doesn’t cause
all consumers to experience problems.
Lack of Service
Ownership
and Lifecycle
Management
If the desire of the organization is to reuse existing services, nothing
will stop that effort in its tracks more than having nobody responsible
for the service after it is put into production. If it requires changes to
support a new consumer, and no one is there to make those changes,
or if funding can not be allocated properly, the project teams will take
the path of least resistance and build their own version of the service.
Lack of Analysis
Outside of Projects
All services are not created equal. Some services may be used by
many consumers, others may only have one consumer. Treating all
services as if they will have many consumers may cause an overinvestment in many, just
as holding off until the second consumer
comes along can cause an under-investment. The proper way is to perform analysis outside
of any particular project to model the
organization and provide some context to make good decisions on the level investment
necessary for any given service.

About The Author

Photo of author Todd Biske

Todd Biske

Todd Biske is a Senior Enterprise Architect with Monsanto in St. Louis, Missouri. He has over 15 years of experience in Information Technology, both as a corporate practitioner and as a consultant, working with companies involved with Agriculture, Atmospheric Sciences, Financial Services, Insurance, and Travel and Leisure. His interests include Service Oriented Architecture, Systems Management Technologies, Usability, and Human-Computer Interaction. He has a M.S. degree in Computer Science from the University of Illinois at Urbana-Champaign, is a member of the SOA Consortium, is a frequent conference presenter, and writes a popular blog on strategic IT topics at http://www.biske.com/blog/

Recommended Book

SOA Governance

SOA Governance is the key to a successful adoption of service-oriented architecture. It is the process of establishing a desired outcome for your efforts, and then leveraging people, policies, and processes to make that outcome a reality. This includes technical policies and standards that guide your design-time activities, preproject policies that impact your project selection and funding decisions, and finally, run-time policies that impact your operational management activities. The adoption of SOA is intended to improve the efficiency and productivity of your company, and your SOA governance efforts are critical in achieving your goals in quality, consistency, predictability, change management, and interdependencies of services.


Share this Refcard with
your friends & followers...

DZone greatly appreciates your support.


Your download should begin immediately.
If it doesn't, click here.

SOA Patterns Refcard Available - Download Now

SOA patterns describe common architectures, implementations, and their areas of application to help in the planning, implementation, deployment, operation, and ongoing management and maintenance of complex systems. The patterns in this refcard are classified...

1 replies - 7515 views - 01/04/09 by Jill Tomich in Articles