Using Variable Based Hints in an ODI Knowledge Module

February 15, 2013 1 comment

In an ODI interface we know that a knowledge module should be chosen. This knowledge module lines up the steps which are executed in order. And also we can add “Options” and “Checks” to make a knowledge module more generic. For more information click here to view documentation . In the knowledge module steps we have been using Oracle sql statements and and also sql hints has been used. How can we implement the hints in the sql statements which are used in the knowledge module?

Since many interfaces use the same knowledge module, it should be generic.

If we add hints directly into the knowledge module as in the below we should prepare a new knowledge module for each interface and as you’ve predicted this method a little bit creepy.

pic1

Instead, we can add KM options to that knowledge module as my mentor have explained in his blog, http://gurcanorhan.wordpress.com/2013/02/05/adding-hints-in-odi/ .

That method is really useful when you need to use SQL hints in your sql statements. But we thought that we can enhance a little bit more to make this method more practical. Consider that if we want to change the hint option value such as increase the parallel degree of  select statement, we should change the option in the interface. Even if your production environment has just execution repository, you could not do that.

Gurcan and me  thought about it and developed a new method to make the parallelism more generic. In a database table we will store the hints of each interface. Consider a knowledge module, its name is “INCREMENTAL UPDATE” and it has 10 steps, totally 15 dml statements. Each dml statements contains hint clauses.

This kind of database table like this:

pic2

For example the TEST_INTERFACE interface uses the “INCREMENTAL UPDATE” knowledge module and this knowledge module has 9 dml statements which can contain hint clause.

The problem is how to access this database table and how to use hint clauses in knowledge module steps.

We know how to access the database in a knowledge module step but how to assign hint clauses which are in the database to java or jython variables.

Step1

Add a step into knowledge module to get the hint clauses from the database. Name the step as “Get Hiints”.

pic3

Step2

Select “Command on Source” tab and in the text area we should write the sql statement to get hint clauses.:

pic4

The sql statement which I wrote seem a little bit complex  but it is not. The point is you should get just ONE ROW with these sql statement however there may be multiple columns it is worthless. In fact, the sql query should return like this row:

pic5

The other important part is we use the

snpRef.getStep("STEP_NAME")

method to use interface’s name, we need to this clause because just before when an interface starts to run hints read from the database table by using interface name.

Step3

After you’ve completed the query in the source tab, select the target tab and configure the java variables as shown in the below:

pic6

We define 9 java variables and each of them contains a hint clause, they are read from the database and assign to the java variables. Pay attention here,

String insertFlow_insertHint=”#HINT_1”;

It means that read the “HINT_1” field (column) of the sql query and assign it to the “insertFlow_insertHint” java variable. Since there is a sql column to java variable assignation, the sql query should not return more than one row.

Step4

Now, we have the hint clauses in the java variables the last point is how to use java bean variables in a sql statements.

pic7

In a knowledge module step you can use the java variables as shown in the above.

In this blog post I tried to tell how to use variable based sql hints in the ODI knowledge modules. Let’s discuss the things which can be more generic!

Advertisements

Coherence Index Performance

In a traditional database we use the index to access filtered data as quickly as possible. In the coherence, indexes are used for same purpose as well, fast access to filtered data. If you wish to retrieve all the items in the cache, likely index are not appropriate for you same as when you use database index. For example, when using the Oracle database and if you use a filter to retrieve data but its selectivity is close to 1, Oracle cost based optimizer can choose to not use the indexes because it’s not necessary since each index operations requires one more I/O. Likewise, in coherence when you wish the get data you should decide correctly whether use it or not.

Anyway, in this blog post I will carry out basic filter operations with and without index.

NamedCache cache = CacheFactory.getCache("persons");
PersonGenerator pg = new PersonGenerator();

cache.putAll(pg.getRandomPeople(2000000));
System.out.println("Cache loaded:"+cache.size()+" elements.");

Filter filter = 
	new OrFilter(
		new AndFilter(
			new EqualsFilter("getFirstName", "Hugo"),
			new GreaterFilter("getAge", 50)
		),
		new AndFilter(
			new EqualsFilter("getFirstName","William"),
			new LessFilter("getAge",50)
		)
		);

Firstly, I connected to the cluster and cache, after that 2 millions of people object is generated, they all put into the cache. Then, filter is created, pay a little bit more attention to here, uppermost the orfilter was used and two and filters was used in the filter. If we convert this filter clause to an sql query it would be :

select * from people where ( firstName="Hugo" and age>50 ) or ( ( firstName="William" and age<50 )

As it is seen here, we are just looking for the firstname and age fields for filtering. If these fields can be used in indexes we could improve the query performance.

To get filtered data we use the entrySet method by passing the Filter object as argument like this:

Set filteredData;
Timer.start();
filteredData = cache.entrySet(filter);
Timer.stop();
System.out.println("Without Index:" + Timer.getDurationMSec());

It prints:

Without Index:12091

Time is milliseconds based.

As you know we don’t use index so far, it is the time to add one:

cache.addIndex(new ReflectionExtractor("getFirstName"), false, null);
Timer.start();
	filteredData = cache.entrySet(filter);
Timer.stop();
System.out.println("With Index:" + Timer.getDurationMSec());

AddIndex method has been used to add index and first parameter of this method is the field’s get method which will be indexed. Second parameter gets a boolean variable and it determines the whether index is sorted or not. Sorted index is useful for range queries like ” age > 20 ” . While creating this index we did not use sorted index because we just looking for equalities, but we’ll use sorted index in the next one.
This code part prints:

With Index:5300

As you see, with one index, entrySet method double times faster than the previous one, without index.
We know that our filter uses two fields, one of them is the firstname which was just before indexed and the other one is the age. We can use age field as well in the index:

cache.addIndex(new ReflectionExtractor("getAge"), false, null);
Timer.start();	
	filteredData = cache.entrySet(filter);
Timer.stop();
System.out.println("With Two Indexes:" + Timer.getDurationMSec());

It prints:

With Two Indexes:1017

In order to get the filtered data set faster we can use the indexed object fields which are used in the filter. If we index the fields which are not used in the filter we won’t get any performance improvement.

cache.removeIndex(new ReflectionExtractor("getFirstName"));
cache.removeIndex(new ReflectionExtractor("getAge"));

cache.addIndex(new ReflectionExtractor("getCitizenNumber"), true, null);
Timer.start();
filteredData = cache.entrySet(filter);
Timer.stop();
System.out.println("With Index BUT Not in the filter:" + Timer.getDurationMSec());

It prints:

With Index BUT Not in the filter:
11391

11391 milliseconds is almost same duration of without index.

TROUG – Coherence Presentation and Demo Video

October 12, 2012 2 comments

Yesterday, I attended Turkish Oracle User Group ( TROUG ) event as a speaker in Bahcesehir University, Istanbul. I talked about cache concept, why do we need to cache and a few Oracle Coherence features. And also I mentioned Coherence Cache topologies such as Replicated Cache, Distributed Cache and Near Cache and also Write,Read Through, Write Behind Queue and Refresh-Ahead mechanisms.

It was delighted but the end of the my presentation I was disappointed with my VirtualBox. When I wanted to make a demo in my Eclipse which runs inside the OEL 5 in VirtualBOX 4, there was something wrong. Some of my Eclipse Launch Configuration have been disappeared in “Run As” section so there was lots of configuration and I lost all of them.

Fortunately, I decided to upload a video which tells the some of Oracle Coherence features and CohQL examples inside the VirtualBox.

You can watch this video in vimeo:

https://vimeo.com/51314443

Here is the my presentation.

I’m attending TROUG ( TuRkish Oracle User Group ) event as a speaker

September 14, 2012 Leave a comment

I will be attending TROUG Event as a speaker on 12 October in Bahcesehir University, Besiktas. There are two parallel sessions during the day and one of them for DBAs and the other one for Oracle Developers. I will be speaking about Oracle Advanced Caching Mechanism in fact Oracle Coherence. 

http://www.troug.org/?p=278

Chameleon Logo

Categories: Uncategorized

How a member can use the existing cache in Coherence

August 23, 2012 1 comment

In the previous post, I’ve explained how a member can join the existing cluster. Before read this post you should understand what was going on there.

At the final in previous post the new member joined the cluster. In this post I’ll explain how a member can use the existing cache ( customers cache which was created by the Eclipse project in this post )

Let’s go on from previous post.
Cluster Join
At the bottom of this screenshot you see that there is a prompt. To use existing cache ( this is the cache it has already created and it’s name is customers ) we type :

Map(?): cache customers

And it gives this output:

As you see the coherence was reading default cache configuration file instead of our cache config file. To force coherence to read our cache configuration file we should edit the JAVA_OPTS parameter in the coherence_join_cluster_test1.sh file ( it was created in previous post ) and add this clause by changing the real path :

-Dtangosol.coherence.cacheconfig=/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config/coherence-cache-config.xml -Dtangosol.pof.config=/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config/pof-config.xml

These xml files are under the src/config folder of the Eclipse project which has already been created.
And also you should change the classpath parameters because in the pof-config.xml file there is a class that should be specified in the classpath to use pof configuration.
To edit classpath parameters, edit the coherence_join_cluster_test1.sh file and go to the line which starts with JAVA_EXEC and change the -cp parameter as :

-cp "-classpath /home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config:/labs/wls1211/coherence_3.7/lib/coherence.jar:/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/bin" 

After editing coherence_join_cluster_test1.sh file save this file as coherence_join_cluster_test2.sh file. The coherence_join_cluster_test2.sh file is here:

Run this sh file.
Then type “cache customers” again in the prompt then you’ll see this output:

As shown in above loaded cache configuration file and pof configuration file is as we wanted. When querying the size of the cache it returns the correct answer:

Map (customers): size
1

Map (customers): 

Joining the Existing Cluster in Coherence

August 21, 2012 3 comments

In previous post I had tried to explain some of the output parameters of Coherence. In this post I will explain how a member can join the existing cluster by setting some parameters.

Before this post you should completed this post:
Running a Coherence Application in Eclipse
We’ve already run this example within Eclipse. Basically it creates a cluster and a cache then put a custom object into the cache. And because of code is over in CustomerTest1.java Coherence Cache automatically closed. To prevent closing automatically we’ll edit the CustomerTest1.java and add the below code to control the application flow.

String dummy;
Scanner user_input = new Scanner( System.in );
dummy = user_input.next( );

And the changed version of CustomerTest1.java is now here:

package com.afsungur.coherence.test;

import java.util.Scanner;

import com.afsungur.coherence.objects.Customer;
import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;

public class CustomerTest1 {

        public static void main(String args[])
        {
        Customer cust = new Customer("Ahmet Fuat", "Sungur", "Bahcelievler/Istanbul", "afsungur@gmail.com", "5321231231", 34111);
        NamedCache cache = CacheFactory.getCache("customers");
        cache.put(1, cust) ;
        System.out.println("Cache Size:"+cache.size());

        String dummy;
        Scanner user_input = new Scanner( System.in );
        dummy = user_input.next( );

        CacheFactory.shutdown();

        }
}

After editing you can run the application inside Eclipse and then you see that application will wait for some input to terminate the cache server. But before terminating we may allow another member to join the cluster.

By the way you can check which parameters were used to start cluster inside the Eclipse as shown in below :

[oracle@localhost bin]$ ps -ef | grep coherence
oracle    6835  2873  8 13:18 ?        00:00:02 /labs/wls1211/jdk160_29/bin/java -Dtangosol.coherence.cacheconfig=src/config/coherence-cache-config.xml -Dtangosol.coherence.cluster=MyFirstCluster -Dtangosol.pof.config=/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config/pof-config.xml -Dtangosol.coherence.clusterport=7252 -Dfile.encoding=UTF-8 -classpath /home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config:/labs/wls1211/coherence_3.7/lib/coherence.jar:/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/bin com.afsungur.coherence.test.CustomerTest1
oracle    6874  5387  0 13:19 pts/1    00:00:00 grep coherence
[oracle@localhost bin]$ 

And so, how we can add a member to cluster? We know that there is a cluster which serves from 7252 port and also we now the cluster name ( MyFirstCluster ) and so on. We can check this parameters from Run->Run Configurations->Coherence(Main Tab)->Other(Sub Tab).
And also you can check as shown in above by using “ps” linux command.

Under the bin folder of coherence main folder ( something like coherence3.7/bin ), there are some sh files. cache-server.sh and coherence.sh can be used to join cluster. Coherence.sh provides an interactive environment to query the cache for putting, getting, listing or something else. Therefore, we use coherence.sh to join cluster.

Just before running the coherence.sh file we edit file and add below arguments to the line which starts with “JAVA_OPTS=” and then save as the file coherence_join_cluster_test1.sh .

-Dtangosol.coherence.cluster=MyFirstCluster -Dtangosol.coherence.clusterport=7252

Then, start the coherence_join_cluster_test1.sh and the analyse the output:
Cluster Join

Most part of output is similar to single member cluster. But there are a few differences.
First of all, at the top of output you see that new member are using the default operational configuration file which is inside the jar file.
And the middle of the output at the MasterMemberSet section you can see that currently how many members are there in the cluster, what is the current member id (ThisMember part), oldest member id and so on.
At the bottom of page you can see the interactive command line and “Map (?):” is written at the left side. Type “help” and then you can see all available commands.

In this post I’ve explained how a member join the existing cluster. In next post I’ll explain how just joined member can use the existing cache.

Some explanations of Oracle Coherence console output

August 17, 2012 1 comment

I will explain some part of the console output that I shared in previous section.

Console Output

The first three rows are related the tangosol-coherence-override settings. This file is the part of Oracle Coherence configuration and it provides many features for Oracle Coherence Cache Server. Something like multicast parameter, tcp parameter..

As shown in above because of our project has tangosol-coherence-override.xml, coherence server uses this file. If this file does not exist in project folder ( src/config as in previous example ), coherence uses the default file which inside the coherence.jar as shown in below:

2012-08-15 03:57:07.976/0.652 Oracle Coherence 3.7.1.1 <Info> (thread=main, member=n/a): Loaded operational configuration from “jar:file:/labs/wls1211/coherence_3.7/lib/coherence.jar!/tangosol-coherence.xml” 2012-08-15 03:57:08.075/0.752 Oracle Coherence 3.7.1.1 <Info> (thread=main, member=n/a): Loaded operational overrides from “jar:file:/labs/wls1211/coherence_3.7/lib/coherence.jar!/tangosol-coherence-override-dev.xml” 2012-08-15 03:57:08.077/0.753 Oracle Coherence 3.7.1.1 <D5> (thread=main, member=n/a): Optional configuration override “/tangosol-coherence-override.xml” is not specified 2012-08-15 03:57:08.081/0.757 Oracle Coherence 3.7.1.1 <D5> (thread=main, member=n/a): Optional configuration override “/custom-mbeans.xml” is not specified.

You also see that which file is used for coherence cache configuration. As shown in below you see that coherence uses our coherence-cache-config file which under the src directory of project.

2012-08-15 04:38:31.548/1.081 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=n/a): Loaded cache configuration from "file:/home/oracle/labs/Coh_labs/workspace/CohExam_AFSungur/src/config/coherence-cache-config.xml"

Another important point that we have a warning as you see in the console. These warnings are related the socket buffer parameters in linux. So far I haven’t changed any linux parameter yet but to clear warnings I will have to do a few. If you didn’t see any socket related warning pass this section.

To increase the socket buffer size in linux you should follow these steps:
edit the /etc/sysctl.conf file and add two lines to the file, before adding check the file whether it has written already or not

# increase TCP max buffer
  net.core.rmem_max = 16777216
 net.core.wmem_max = 16777216

After editing you should reload the settings, to do this

[root@localhost ~]# /sbin/sysctl -p

After run that you will see the changed parameter.

And let’s go back to the console, we see the TCMP and cluster-name. TCMP is a protocol that Coherence uses. It has many features like multicast communication, unicast communication and recognize new server which has just joined the cluster or the server which has just dropped from cluster.

Cluster-name parameter provides a name to the cluster so you can create multiple cluster on the same network. In our example cluster-name something like weird hex characters. Because we didn’t imply any name to the cluster. To give a name to the cluster edit the tangosol-coherence-override.xml file and add this lines:

<cluster-config>
      <member-identity>
         <cluster-name system-property="tangosol.coherence.cluster">
            MyCluster</cluster-name>
      </member-identity>
</cluster-config>


After it is added to tangosol-coherence-override.xml rerun the application and cache servers then you’ll see this output:

2012-08-15 04:38:35.707/5.240 Oracle Coherence GE 3.7.1.1 <Info> (thread=Cluster, member=n/a): Created a new cluster "MyFirstCluster" with Member(Id=1, Timestamp=2012-08-15 04:38:32.417, Address=10.0.2.15:8088, MachineId=2063, Location=site:,process:4244, Role=AfsungurCoherenceCustomerTest1, Edition=Grid Edition, Mode=Development, CpuCount=1, SocketCount=1) UID=0x0A00020F000001392A13EC61080F1F98
2012-08-15 04:38:35.712/5.245 Oracle Coherence GE 3.7.1.1 <Info> (thread=main, member=n/a): Started cluster Name=MyFirstCluster

As shown in above there are many attributes which are shared. For example you see that IP address of client, machine id, member id ( this is so useful when you design your load balancing mechanism ), cpu and socket count, role information. Cpu and socket count parameters are 1 because I run coherence on VM within a single cpu.  Role name  allows an application to get together coherence members into special roles. For example cache clients and cache servers. And site name ( location ) provides intelligent routing, load balancing and  disaster recovery planning while you are using WAN clustering, that’s really cool.

In the MasterMembetSet section there are information about the members which are connected the cluster.

And the bottom of the console you see that PofConfiguration is loaded from the file which is under the src/config folder of the project.