BI & Warehousing
Where are my sub templates?
Everything is a Bit Bucket
@oraclenude
oraclenude.crisatunity.com
In response to Chet's frustration over yet another encounter with a database agnostic, I wanted to contribute my first article to the oraclenerd franchise. My thoughts seemed too long for the comment stream.
I ascribe the kernel of thought behind "the database is a bit bucket" primarily to each and every database vendor that ever existed. Every database vendor, in an effort to persuade users of competitive products to adopt their product, has participated willingly in espousing some core aspect of how "same as the other guy" their product is in addition to whatever differentiation pitch they have.
Now, the generally weak-minded and lazy developer (yes, I think the majority of developers are in fact weak-minded and lazy) latches on to the vendor's selective "sameness" claims for professional and personal reasons. (full disclosure: I am both an Oracle DBA and .NET developer)
Professionally, because they are financially invested in writing third-party code not database code. To them, the less they spend learning and understanding the particulars of things like databases, operating systems, networks, human beings, etc. the better. Personally, because there is a dominate thread in the culture of developers to dismiss the database as interesting or meaningful. It is a form of heresy to show affection towards any platform in any specificity.
This is why Java's Big Lie of "write once, run any where" swoons so many. Java's Big Lie is analogous to "the database is a bit bucket" by declaring that even the language of software code should be as absolutely interchangeable as possible - even at the expense of being cost-effective or useful. There is an unquestioned faith that decoupling everything from everything is a good thing. This faith gives us code that is as far from the simplest thing that could work from the first moment writing the code is undertaken. It is a faith I reject. That's why I'm an ORACLENERD.
P.S. I know oraclenude and oraclenerd is confusing. It's supposed to be.
Google gets Barcodey
Multiple OBIEE Environments
For any large OBIEE project the normal set-up would involve having a development machine, a test machine, Production machines and some form of disaster recovery. These normally need to be on separate physical machines, mainly for access rights reason and of course the DR machines need to be in a separate building (hopefully in a separate city!)
Did you know that you can have multiple OBIEE environments, but without buying new boxes?
Why do I need multiple Environments?I am not suggesting that development and Production are on the same box, but there are situations that demand environments, such as:
Multiple Projects - The success of many a good OBIEE project will often lead to other departments in the organisation wanting in on the action. Rather than build a whole new development, UAT and Production box you can just add more services to the existing one.
Clustering - although you often cluster for performance, you should also cluster for availability. The normal process is cluster over two or more boxes, but you can also cluster on the same box across users.
Development Cycles – Sometimes you need to create and test a version of your config (rpd /webcat) for a particular release, but carry on developing for the next release. This calls for multiple development environments.
Integration Testing - You can create your repository, but will it work in UAT and production, particularly if they have SSO and your dev environment does not? I prefer to have a pre-UAT environment in place for Developer testing, or internal testing teams to use.
Sandbox - The last thing you want is for all developers to use the same master development repository to implement new models or methods. Get the developer to work on their own copy, or new one, in a separate area to prove that it works.
Worldwide Development – I am not a big fan of MUD. It will work in certain situations, but you normally find that someone hogs the lock on the master repository. The main issue is new business models. One way around this is to have a master repository and have developers around the world use a copy to build their sections, then merge in their changes when they are tested.
Production Support – In a controlled environment the developers do not have access to production (I normally set the production rpd to read only just in case!)
Demo Site – To help your users understand what is possible you can install the sample sites and give all your potential users access.
Training Site – To support training you often need to build a separate environment. You may not want the hassle of whole box to do this on.
How Do I Create Multiple OBIEE Environments?Linux
On a Linux box you can install multiple environments into individul user accounts. There is no need to use vitualisation.
When you install a new OBIEE into a separate user you need to make sure that OC4J is not running. Other than that it is a normal simple installation.
After install you have to update all the ports in the system, making sure that you have the correct settings for BI Server, Presentation server, Javahost and graph server. The most important thing is to make sure you do not create spaghetti! Keep a central document on the environments and all their ports.
You can then update the cluster settings as normal, and put the webcat into a shared area for clustering.
We have created a script that does a silent installation and updates all the ports numbers.
Windows
Now there’s another story! My advice is stick to Unix or Linux for now. In theory you can create multiple services to run but I havn’t tried it yet. The simple solution would be to use virtualisation.
If you give it a go let me know how you get on.
APEX: Create and Parse Arrays
A question came up today whether we could get multiple values into a single variable (Item in APEX).
Yes we can!
APEX_UTILSNeed some data first:
CREATE TABLE t ( some_text VARCHAR2(10) );I want that table data to be in a single item. TABLE_TO_STRING is your function.
INSERT INTO t ( some_text )
SELECT dbms_random.string( 'a', 10 ) some_text
FROM dual
CONNECT BY LEVEL <= 5;
CJUSTICE@TESTING>SELECT * FROM t;
SOME_TEXT
----------
thrFXviVWJ
kpfGRRwctv
EVxNrcmBHC
gcBlHaKrLa
irYduOZfkS
VAR C VARCHAR2(100);Easy enough. How about converting it back to a table? STRING_TO_TABLE is your answer.
DECLARE
l_table APEX_APPLICATION_GLOBAL.VC_ARR2;
BEGIN
SELECT some_text
BULK COLLECT INTO l_table
FROM t;
:c := apex_util.table_to_string( p_table => l_table );
END;
/
PL/SQL procedure successfully completed.
C
-----------------------------------------------------------
thrFXviVWJ:kpfGRRwctv:EVxNrcmBHC:gcBlHaKrLa:irYduOZfkS
DECLAREDone.
l_table APEX_APPLICATION_GLOBAL.VC_ARR2;
BEGIN
l_table := apex_util.string_to_table( p_string => :c );
FOR i IN 1..l_table.COUNT LOOP
d( 'value ' || i || ': ' || l_table(i) );
END LOOP;
END;
/
value 1: thrFXviVWJ
value 2: kpfGRRwctv
value 3: EVxNrcmBHC
value 4: gcBlHaKrLa
value 5: irYduOZfkS
PL/SQL procedure successfully completed.
The "Database is a Bucket" Mentality
Friend of mine asked me last week for some advice, specifically asking if there was a tool to convert Oracle SQL Syntax to the ANSI SQL syntax. (A quick search turned up this (it was the first result), if you're interested).
I had to ask why.
Client is switching to an open source database, i.e. "free." Oracle licensing is way too pricey.
I'm sure Oracle costs a lot of money, it's pretty darn good software. Quite possibly the best in the world especially in the database realm. I've written about the incredibly feature rich goodness that is the Oracle database here here...actually, just trust me. It's in my name.
Why is there even a comparison?
Could it be that everyone out there believes that the sole purpose of a database is to store data? That it can't do anything else? The storage and retrieval of data...that's all it does of course.
It's like saying the Democrats and Republicans are the same...at face value, perhaps, but the devil is in the details.
This, this "Bit Bucket" mentality is what is so incredibly frustrating.
I am no position to argue the differences between the various flavors of database, I lack the experience. But if I were using SQL Server, I would leverage the shit out of it's capabilities. If I were using MySQL, I would leverage the shit out of it's capabilities. If I were using Firebird, I would leverage the shit out of it's capabilities. Same goes for every single flavor out there. Get my point here?
The database is NOT a bit bucket!
Do I need to use more 4-letter words?
I know that Oracle is feature rich and that 99% percent of your code can live in the database...think APEX and PL/SQL. You could probably put ALL of your code inside the database if you wanted to put the javascript in BLOBs as well.
Please, please please quit telling me they are the same...they are not.
Follow up rant by Mr. O'Neill can be found on this following post Everything is a Bit Bucket
Thoughts on Change Data Capture
In little over a month I will be in Las Vegas speaking at Collaborate 10. There is a lot of BI / DW talks this year and for the first time with BIWA Training Days branding. Rittman Mead will be there at the conference giving talks on each of the conference days. If you are at the conference (or even just on vacation there) then come and say ‘Hi’ to Stewart, Venkat, Mark and myself.
My talk will be about Realtime Data Warehousing – it is an overview of reasons, techniques and pitfalls, but I do cover a lot of material in that hour. Of course, Change Data Capture (CDC) will be a major part of the talk; Oracle has so many options here including their recently acquired GoldenGate product set. As always, the slides will be here on the Rittman Mead site soon after I speak.
My colleague, Stewart Bryson has also had some recent thoughts about change data capture over on the TDWI group at LinkedIn.com (group membership needed); he was quite preceptive (and on the money, in my opinion) with his comment “I would hesitate to let technical limitations dictate user requirements. In today’s BI/DW market, there are very few technical limitations that cannot be solved one way or another.”
One of points I will make in my Realtime DW talk, and perhaps I need a few more slides to do it justice, is the need to profile the change you capture on the source system. Often there is a lot of “noise” that looks like change but you have no real interest in it at the data warehouse. Not all systems are “well behaved”; I have seen systems that always update a record even if nothing has changed and even systems that update each column as separate statement with its own commit. Of course, even systems that don’t have those vices can still have columns that have no DW significance being updated and see those changes being filtered out on the data warehouse after we had already done a lot of work (processing, network bandwidth and the like) to get the data there.
The more I do this kind of work I feel there is a need to switch CDC on on the live source for a while and see the typical patten of change that occurs in a day, week, period whatever and then make decisions on how to handle this defensively downstream. Do we need to exclude certain columns that are just “noise”? What will be the impact of multiple, rapidly-occurring commits on how we handle SCD-2 dimensions? Of course we can predict what will see and come up with a proposed solution but the real source often has a few surprises up its sleeve – once a customer gave me a sequence of order statuses that an order passed through in its life-cycle except that on the actual source system the order sequence was not the same as their documentation and that would impact our reporting.
Code Comment WTF? Part 209
-- ********************************Seriously? Was that necessary? Could I possibly be under the illusion that it is not the end of the package?
-- End of Package Body
END package_pkg ;
/
Stop it.
Now.
APEX: LDAP Authentication
I knew APEX had an LDAP authentication scheme (and don't know the full history of the project so I can't (won't) comment on why it wasn't used). So I fired up my local sandbox just to see how easy or hard it was. Admittedly, I have always avoided anything to do with LDAP...not sure why (plate is full?). I used this as a guide.
Anyway, it was remarkably easy.
Setup
APEX: 3.2.1
Web Server: Apache (OHS)
Database:
BANNERFirst I fired up the web server:
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production
C:\oracle\http\opmn\bin>opmnctl startOpened up APEX, and created a new application. For authentication schemes I chose "No Authentication."
opmnctl: opmn started
C:\oracle\http\opmn\bin>opmnctl startproc process-type=HTTP_Server
opmnctl: starting opmn managed processes...
After I had created the application, I went into Shared Components --> Authentication Schemes --> Create
Select the default and click Next

Select "Show Login Page and Use LDAP Directory Credentials" and click Next

I've already done this so I'm selecting my current Login page, 11, click Next

Enter your LDAP Host and your DN:

Your DN String should look something like this (from article above):
cn=%LDAP_USER%,l=amer,dc=oracle,dc=comMake sure you use the %LDAP_USER% after the cn= portion of the string.
Name it ldap_test, click Create Scheme:

You will then be redirected back to the list of Authentication Schemes, ldap_test should now be current

To test it just run your application and login using your LDAP (AD) credentials

Success!

Top 10 Hyperion Events of the Decade is Over
That said, I’m sure every single person who read this series disagreed with at least part of it. I’m suspecting that with the exception of the #1 event being the Hyperion/Oracle merger (I reject your reality and substitute my own!), you probably don’t like the order, you can’t believe I included some of these items, and you’re shocked I forgot a few events. Feel free to eviscerate me in the comment section of this blog. Not to say I enjoy that sort of thing, but at least it shows you care enough to flame.
Hopefully, the next ten years will bring even more innovation while providing far less upheaval. Unless Oracle wants to buy up SAP and sell them off for parts...
February Top 10
John deserves quite a bit of recognition so the more I can provide, the better.
TitlePageviewsUnique PageviewsEBS Install Guide - Part I541394Learn Oracle Business Intelligency (OBIEE)444337EBS Install Guide - Part 2347262DBMS_CRYPTO: Example339295APEX: Install 3.2.1 on 11gR2263225EBS Install Guide - Part 3253202OBIEE Posts243215OBIEE: How to Migrate Your rpd234183BULK COLLECT and FORALL187166
SQL Developer: Install Unit Testing Repository
I'm not a big tools guy, I prefer SQL*Plus to anything else. I especially don't like paying for tools (yes, the database is a tool and costs a lot of money...I do realize the hypocrisy).
After Syme Kutz's presentation at SOUG, I've been looking more closely at SQL Developer. I've been using it (and JDeveloper) since they were both made freely available a few years ago. Mostly for the schema browsing, looking around, importing and exporting data. I do use it (SQL Developer) to write reports that I can share with the Business folks as well.
Syme's presentation was primarily on Unit Testing (which I begged for). First step to using Unit Testing is to install the repository, a set of tables the application uses to build and store tests and their results.
You need to have version 2.1 or greater.
First up, go to Tools --> Unit Test --> Select Current Repository

You'll be prompted to select a connection (i.e. database) to use

Would you like to create one now? Select Yes.

You're then told the the required roles do not exist, select OK.

Confirm running SQL

Running...will take just a few seconds

Success!

That's it. Easy right? Future posts will detail managing users and creating tests.
Oracle BI EE 10.1.3.4.1 – Solutions – Puzzle 4
The 4th puzzle in this series was a reasonably simple one which basically demonstrates a capability that allows end users to bypass the security applied in the BI Server layer. There are 3 possible solutions for this
Solution 1: Using Evaluate
EVALUATE was introduced in the 10.1.3.3.1 version of BI EE, that allows end users to call database functions directly. This feature has a lot of uses and actually is widely used. One such use case (others might term this as a bug) is its ability to call table columns that are not in the repository, by using native database functions. For example, the report shown below is a very simple one where only dimensional attributes CHANNEL_CLASS and CHANNEL_TOTAL exist in the report

The SQL for this report is give below
select distinct T4167.CHANNEL_TOTAL as c1,
T4167.CHANNEL_CLASS as c2
from
CHANNELS T4167
order by c1, c2
In our case CHANNEL_DESC column is completely not available for reporting as that is blocked by Security in the presentation layer as shown below

To bypass this, in one of the columns of the above report, use an EVALUATE function shown below
EVALUATE(’DECODE(%1,”Dummy”,NULL,CHANNEL_DESC)’ AS CHAR(10),Dim.CHANNEL_CLASS)
This will basically bypass the security of the BI Server and will provide a means of looking at the CHANNEL_DESC column directly. The report and the SQL are given below

select distinct T4167.CHANNEL_TOTAL as c1,
T4167.CHANNEL_CLASS as c2,
DECODE(T4167.CHANNEL_CLASS,'Dummy',NULL,CHANNEL_DESC) as c3
from
CHANNELS T4167
order by c1, c2, c3
So far so good. Now, lets try adding a measure to this report. You will notice that this will start producing an OCI error.

The reason is since CHANNEL_DESC was not part of the report directly, BI Server did not include that as part of the Select or the Group By clause. The wrong SQL is given below
select T4167.CHANNEL_TOTAL as c1,
T4167.CHANNEL_CLASS as c2,
DECODE(T4167.CHANNEL_CLASS,'Dummy',NULL,CHANNEL_DESC) as c3,
sum(1) as c4
from
CHANNELS T4167
group by T4167.CHANNEL_CLASS, T4167.CHANNEL_TOTAL
order by c1, c2, c3
Now, to make this work even when a fact attribute is included, the only option is to somehow push this column inside an Aggregate function like SUM, MAX etc since we cannot make the BI Server to generate the Group By (for the secured column). This requires a non-secure dimensional attribute at the same grain as the secured column. In our case, both CHANNEL_CLASS and CHANNEL_DESC have one to one relation and hence i will use that itself to demonstrate. The idea is to encapsulate the EVALUATE expression inside a string aggregate function like MAX etc as shown below
MAX(EVALUATE(’DECODE(%1,”Dummy”,NULL,CHANNEL_DESC)’ AS CHAR(10),Dim.CHANNEL_CLASS))

select T4167.CHANNEL_TOTAL as c1,
T4167.CHANNEL_CLASS as c2,
max(DECODE(T4167.CHANNEL_CLASS,'Dummy',NULL,CHANNEL_DESC)) as c3,
sum(1) as c4
from
CHANNELS T4167
group by T4167.CHANNEL_CLASS, T4167.CHANNEL_TOTAL
order by c1, c2
How do we disable this ability completely as this might be considered as a security bug? Ideally i would like to have a privilege in Answers, that can basically stop the use of EVALUATE functions thereby providing us with the capability of controlling who has access to this. But since this is not currently available, the easiest approach is to make sure that all your query columns are pushed into a sub-query. There are a multiple ways we can push all the columns to a sub-query. I will list them below
1. Using Logical Columns
2. Using Level Based Measures
3. Using SELECT based tables
There are other techniques as well. But for the sake of keeping this brief, i will show how all columns can be pushed into Sub-Queries using SELECT tables (instead of the normal tables obtained through Import). The idea is to use a simple table based on SELECT as shown below

instead of the normal imported table. Then we cannot use EVALUATE as the EVALUATE function can be pushed only to the sub-query. For example, the SQL for the same report above, using SELECT table instead of normal table is given below
select distinct T5419.CHANNEL_TOTAL as c1,
T5419.CHANNEL_CLASS as c2
from
(SELECT
CHANNEL_CLASS,
CHANNEL_CLASS_ID,
CHANNEL_ID,
CHANNEL_TOTAL,
CHANNEL_TOTAL_ID
FROM
CHANNELS) T5419
order by c1, c2
As you see, whatever function we apply in the front-end will always be pushed only to the outer query (inner sub-query will always remain the same which is for the SELECT table). Hence other than the columns in the Select sub-Query, EVALUATE cannot get external columns residing in the actual table (CHANNEL_DESC for example). If we try using the same EVALUATE function we will get an error. But this approach has potential performance issues since for every query, depending on the database, predicates might not get pushed from outer query to inner sub-query(within the optimizer) thereby causing performance issues.
People who answered this correctly: Craig, Anu
Solution 2: Using BYPASS_NQS authentication
This is not something that is normally used. But this is another important example where when the authentication model gets changed, the entire BI EE security can be bypassed. This security model allows any user to login to BI EE. But only database users will be allowed to report out of the databases(login to BI EE with the same username/password as the database). When this is done, any security that is applied at the column level will be bypassed and hence everyone can see the secured column data as well

Solution 3: Impersonation & Proxy Authentication
For this solution to work, one needs to have the privilege to do proxy authentication. Also, one cannot call this exactly as a security bypass since the user requires the capability to proxy in as another user. But there are cases when logged in as a single user, to look at the actual report values (during report development) we might want to login as Administrator which will essentially bypass all the security that is applied at the column level in BI Administrator. For more details on Proxy Authentication refer my blog post here.
The 5th Puzzle in this series to follow later this week.
#1. Oracle Acquires Hyperion
The Hyperion acquisition was one of the smoothest I’ve ever seen thanks to a couple of factors. Oracle extended offers to 92% of the Hyperion workforce so all the key personnel made it into Oracle. Oracle also put some great Hyperion people in charge of BI and EPM at Oracle including John Kopcke and Robert Gersten. While the old Oracle BI/OLAP guard was not amused, I’m sure, Hyperion provided new life to the BI group at Oracle when they were put over the top of the existing leadership (rolling up to Thomas Kurian in Fusion Middleware).
Oracle has made some great decisions since the acquisition: Essbase got its name back (may “Hyperion System 9 BI+ Analytic Services” burn in the 7th circle of hell for eternity) as simply Oracle Essbase, Smart View became the Office standard for BI products, Hyperion Planning became the standard for budgeting at Oracle, Financial Management became the strategic direction for financial consolidation, and more. The only negative things I can really say about the acquisition are the loss of the annual Solutions conference and the disappearance of the Hyperion User Groups, and by this point, I’m almost nitpicking.
Like it or not, Oracle’s acquisition was by far the most significant event in Hyperion’s decade. While I never thought I'd say this, I thank God that Larry Ellison decided to forgo buying a bigger yacht in favor of a $3.3BB EPM software firm.
OBIEE: Default Answers Template?
Here's the post on OTN. I started to get nervous too, I posted on Friday and hadn't had a response...until today. 3 days? Man...that's way too long!
Here's the short of it.
Our reports were coming out funny. Dimension column headings had one style and the Fact table column headings had another.

Using Firebug, I could easily isolate the sections.
On the Dimension column, the definition looked like this:
<thThe Fact table column was defined as:
class="ColumnHdg"
style="background-color: rgb(231, 231, 247); font-size: 9px; color: rgb(0, 51, 102);"
scope="col"
dir="ltr">Product Desc
</th>
<thNote the style attribute...that overrides any class settings. Very annoying.
class="ColumnHdg"
scope="col"
dir="ltr">Basis Amount
</th>
I thought it would be relatively simple to fix. I worked with custom messages before, this had to be similar. So I began to "grep" the messages directory
c:\oraclebi\web\msgdb\>findstr /i /m /s /c:"background-color" *.*Nothing.
How about looking for the name of the class, ColumnHdg?
c:\oraclebi\web\msgdb\>findstr /i /m /s /c:"columnhdg" *.*So I start with criteriatemplates.xml and find the reference to columnHdg (just now realizing that the case doesn't match...oh well). That was part of the WebMessage kuiColumnFormulaEditor. So I searched for that...
messages\criteriatemplates.xml
messages\formattemplates.xml
messages\mktgadminuitemplates.xml
messages\mktgcommontemplates.xml
messages\mktglistformattemplates.xml
messages\mktgsegmenttemplates.xml
You see where I'm going.
That lead me to the javascript files (of which there are tons). Nothing...not a single thing that could possibly be adding this style attribute.
That's when I mapped the dev server drive to my local computer and opened up WinMerge
I then began to compare every single file in both the msgdb and res (javascript) folders. I would then compare the files that were different to see if that could be the cause. Still...nothing.
I had looked in the webcat before, but couldn't find anything of global significance. I was headed back there though since I had lost hope with custom messages and/or javascript.
Then I got a message from the OTN Forum Administrator...could it be? Looked at the name of the person who answered it first, John Minkjan...sweet!Looks like you forgot to reset the OOB settings when installing OBIEE:
have a look here to reset them:
http://obiee101.blogspot.com/2009/02/obiee-editing-system-wide-defaults.htmlClick through, follow his instructions, bounce the server and voila!

As of this writing, I don't know what OOB stands for; I could make something up, but it probably wouldn't be appropriate for this family site.
Thanks John!
Uninstalling Excel Analyzer
HUGmn techday2010 - March 11 in Minneapolis
Alan Lee will be speaking on the new features coming in Essbase 11.1.2 and Karen Zubetz will discussing the new OBI Applications Essbase Integrator which will, in essence, spin Essbase cubes from Oracle BI Server sources. Mike Malwitz will talk about what is new, and what is coming, with HFM. My presentations are on Essbase 11 installation and on converting a VBA application to Dodeca. Here is the full agenda for the meeting: http://www.harbinger-group.com/HUGpage.html#.
I will be in Minneapolis for a couple of days and I have promised myself I would take a bit of time to enjoy my trip. I am planning to visit the climbing gym in St Paul on Wednesday night (Vertical Endeavors - http://www.verticalendeavors.com/st_paul/index.html) and am planning to ski on Thursday night (Afton Alps - http://www.aftonalps.com). Anyone interested in joining me?
MultiSheet Excel Output
OBIEE: XML File as Data Source
So here goes.
You'll need 2 files: a XML file and a XSL stylesheet (which defines the XML file). I found said files here. I'll also provide them here (cdcatalog.xml) and here (cdcatalog.xls).
In the Administration tool, go to File --> Import --> from Database

When you prompted to select a datasource, select XML and the window should look like this

For URL, browse to the location you saved cdcatalog.xml, do the same for XSLT

Hit OK and you'll be prompted with the Import dialog

Import the "table" and you're done.

Inside the Oracle BI Server Part 3 : BI Server In-Memory Joins
In the previous two postings in this series, I looked at the architecture of the Oracle BI Server, and how it processes incoming queries from Oracle BI Answers. In the latter article I touched on the concept of BI Server in-memory joins, and in this article I want to expand on this topic and look at just what goes on when the BI Server is called upon to combine data from multiple sources.
When the BI Server executes a query plan, it handles the data in four separate stages:

- Firstly, filters and functions are applied to the data from each data source
- Then, the data from these data sources are aggregated as required
- Then they are joined together (or “stitched” together), and
- Then, any calculations and/or aggregations that are applied across data sources are applied
In simple OBIEE environments, data used by a request will come from a single database, and therefore any joins that need to be performed by the BI Server will automatically be “pushed down” to the underlying database. In the cases though where more than one physical database is being used to provide data for a query, this join will instead need to be performed by the BI Server “in memory”. This ability to “federate” data sources, and therefore produce reports and analysis that span multiple data sources, but present the data to users as if it was a single database, is one of the key unique features of OBIEE and sets it apart from tools like Discoverer which are really restricted to reporting against single data sources.
So given this capability, how does it work under the covers? When does the BI Server perform a join in-memory, and when does it get done at the underlying database level? Where can we see what is happening, and can we predict what method the BI Server will use when performing a join? Finally, what algorithm does the BI Server use when performing these joins, and how does it use memory and disk when during the process?
To illustrate how the process works, there are a number of join scenarios that we need to consider. Some relate to joining fact and dimension tables together, and others relate to joining fact tables that share conforming dimensions, or hold conforming data sets of differing granularity.
Joining Fact and Dimension Tables Together
The BI Server semantic layer requires you to organize your business model and mapping layer into a star schema. This star schema may have one or more logical dimension tables, that join to one or more logical fact tables. The logical fact tables typically have conforming dimensions, so that you can create requests that span multiple fact tables and multiple dimension tables.
Taking for the moment joins between fact and dimension tables, depending on how the underlying physical or logical table source joins are set up in the semantic model, these may be either inner joins, left outer joins, right outer joins or full outer joins. The simple example to consider is a business model that is mapped to a single physical database, so that all logical table sources point to the same underlying data source, as shown in the screenshot below:

In this case, if we issued a request against this business model that required data from a dimension table and a fact table, the BI Server would push the join between logical table sources down to the underlying database, a single SQL query would be generated and the execution plan from a level 5 query log entry would look like this:
-------------------- Execution plan: RqList <<2105>> [for database 3023:2820:orcl3,44] PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2820,44], sum(SALES.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2820,44] Child Nodes (RqJoinSpec): <<2136>> [for database 3023:2820:orcl3,44] PRODUCTS T2874 SALES T2911 DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0] GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44] OrderBy: c1 asc [for database 3023:2820,44]
The same would apply to a left outer join between table sources in the same database, a right outer join or a full outer join. The BI Server doesn’t do any work here except to issue a single SQL query, and you can see just the one “RqList” (request list) in the execution plan, indicating again that the BI Server thinks it only needs to put together one query to satisfy the request.
If, however, one of the logical dimension tables had its logical table source re-pointed to a separate physical database, as shown in the screenshot below, the BI Server would now have to do the join itself, as it can’t be pushed down to the underlying database (as there are now two of them).

In this case, two SQL queries would be issued, one against each of the two physical databases, and the BI Server would do the join in-memory (or to disk, I’ll elaborate on this later on). The corresponding logical execution plan from a level 5 log file would now look like this:
-------------------- Execution plan: RqBreakFilter <<2465>>[1] [for database 0:0,0] RqList <<2466>> [for database 0:0,0] D1.c2 as c1 [for database 3023:2500,44], sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0] Child Nodes (RqJoinSpec): <<2478>> [for database 0:0,0] ( RqList <<2482>> [for database 0:0,0] D902.c1 as c2 GB [for database 3023:2500,44], D901.c2 as c3 [for database 3023:132,44], D901.c3 as c5 [for database 3023:132,44] Child Nodes (RqJoinSpec): <<2490>> [for database 0:0,0] ( RqList <<2495>> [for database 3023:132:orcl,44] SALES.PROD_ID as c2 [for database 3023:132,44], sum(SALES.QUANTITY_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:132,44] Child Nodes (RqJoinSpec): <<2504>> [for database 3023:132:orcl,44] SALES T211 GroupBy: [ SALES.PROD_ID] [for database 3023:132,44] OrderBy: c2 asc [for database 3023:132,44] ) as D901 InnerJoin <<2492>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ] ( RqList <<2517>> [for database 3023:2500:orcl2,44] PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2500,44], PRODUCTS.PROD_ID as c2 [for database 3023:2500,44] Child Nodes (RqJoinSpec): <<2523>> [for database 3023:2500:orcl2,44] PRODUCTS T2502 OrderBy: c2 asc [for database 3023:2500,44] ) as D902 OrderBy: c2, c3 [for database 0:0,0] ) as D1 OrderBy: c1 asc [for database 0:0,0]
Notice the “InnerJoin <<2492>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ]“ that is in the middle of the execution plan, between the two main Rqlists – this tells you that the BI Server is doing the join, as it would only appear here if it couldn’t be pushed down to the underlying database. You might also find references to LeftOuterJoin, RightOuterJoin and FullOuterJoin here, depending on how the join between the tables is defined in the physical or logical table source joins in your semantic layer.
Joining Facts with Conforming Dimensions Together
Another situation occurs when you are joining fact tables together that share conforming dimensions. A simple example of this is where you create a request that requires data from two or more fact tables that share conforming dimensions, such as those shown in the screenshot below:

As requests such as these can potentially lead to “fan trap” issues (explained in this blog post), the BI Server knows that it has to generate two logical queries and join, or “stitch” them together to avoid the fan trap. If both fact tables are sourced from the same physical database, and this database supports subquery factoring (the “WITH” clause that you see in Oracle 10gR2/11g SQL statements) then it will generate the following execution plan, which has a FullOuterStitchJoin between the two inner RqList blocks:
RqBreakFilter <<3571>>[3] [for database 0:0,0]
RqList <<3462>> [for database 3023:2820:orcl3,46]
D1.c1 as c1 GB [for database 3023:2820,46],
D2.c1 as c2 GB [for database 3023:2820,46],
case when D1.c2 is not null then D1.c2 when D2.c2 is not null then D2.c2 end as c3 GB [for database 3023:2820,46]
Child Nodes (RqJoinSpec): <<3567>> [for database 3023:2820:orcl3,46]
(
RqList <<3474>> [for database 3023:2820:orcl3,46]
sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46],
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46]
Child Nodes (RqJoinSpec): <<3507>> [for database 3023:2820:orcl3,46]
PRODUCTS T2874
COSTS T2830
DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,46]
) as D1 FullOuterStitchJoin <<3565>> On D1.c2 = D2.c2
(
RqList <<3511>> [for database 3023:2820:orcl3,46]
sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46],
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46]
Child Nodes (RqJoinSpec): <<3544>> [for database 3023:2820:orcl3,46]
PRODUCTS T2874
SALES T2911
DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,46]
) as D2
OrderBy: c3 asc [for database 3023:2820,46]
The BI Server Navigator then generates a single SQL statement off of this execution plan, which queries both fact tables using subquery factoring, and then brings the results together in the main body of the statement:
-------------------- Sending query to database named orcl3 (id: <<3462>>):
WITH
SAWITH0 AS (select sum(T2830.UNIT_COST) as c1,
T2874.PROD_SUBCATEGORY_DESC as c2
from
PRODUCTS T2874,
COSTS T2830
where ( T2830.PROD_ID = T2874.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC),
SAWITH1 AS (select sum(T2911.AMOUNT_SOLD) as c1,
T2874.PROD_SUBCATEGORY_DESC as c2
from
PRODUCTS T2874,
SALES T2911
where ( T2874.PROD_ID = T2911.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC)
select distinct SAWITH0.c1 as c1,
SAWITH1.c1 as c2,
case when SAWITH0.c2 is not null then SAWITH0.c2 when SAWITH1.c2 is not null then SAWITH1.c2 end as c3
from
SAWITH0 full outer join SAWITH1 On SAWITH0.c2 = SAWITH1.c2
order by c3
If the physical database doesn’t support subquery factoring, such as Oracle Database 10gR1 or higher, then the BI Server generates a slightly different execution plan, again with a FullOuterStitchJoin, like this:
-------------------- Execution plan:
RqBreakFilter <<3115>>[3] [for database 0:0,0]
RqList <<3006>> [for database 0:0,0]
D903.c1 as c1 GB [for database 3023:2820,44],
D903.c2 as c2 GB [for database 3023:2820,44],
case when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end as c3 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3117>> [for database 0:0,0]
(
RqList <<3160>> [for database 0:0,0]
D901.c1 as c1 GB [for database 3023:2820,44],
D902.c1 as c2 GB [for database 3023:2820,44],
D901.c2 as c3 [for database 3023:2820,44],
D902.c2 as c4 [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3163>> [for database 0:0,0]
(
RqList <<3018>> [for database 3023:2820:orcl3,44]
sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44],
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3051>> [for database 3023:2820:orcl3,44]
PRODUCTS T2874
COSTS T2830
DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44]
OrderBy: c2 asc [for database 3023:2820,44]
) as D901 FullOuterStitchJoin <<3109>> On D901.c2 = D902.c2; actual join vectors: [ 1 ] = [ 1 ]
(
RqList <<3055>> [for database 3023:2820:orcl3,44]
sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44],
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3088>> [for database 3023:2820:orcl3,44]
PRODUCTS T2874
SALES T2911
DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44]
OrderBy: c2 asc [for database 3023:2820,44]
) as D902
) as D903
OrderBy: c3 asc [for database 0:0,0]
This is then resolved for this database into two separate SQL statements, which then joined “in-memory” together by the BI Server.
-------------------- Sending query to database named orcl3 (id: <<3018>>):
select sum(T2830.UNIT_COST) as c1,
T2874.PROD_SUBCATEGORY_DESC as c2
from
PRODUCTS T2874,
COSTS T2830
where ( T2830.PROD_ID = T2874.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC
order by c2
+++Administrator:2a0000:2a0005:----2010/02/28 15:05:31
-------------------- Sending query to database named orcl3 (id: <<3055>>):
select sum(T2911.AMOUNT_SOLD) as c1,
T2874.PROD_SUBCATEGORY_DESC as c2
from
PRODUCTS T2874,
SALES T2911
where ( T2874.PROD_ID = T2911.PROD_ID )
group by T2874.PROD_SUBCATEGORY_DESC
order by c2
Joining Table Sources within a Logical Fact
Another situation is a fact table may have more than one logical table source, because individual measures are sourced from different data sources or perhaps measures may be mapped in at differing levels of granularity (this blog post describes such a scenario). In this case, again the BI Server will initially try and push the join down to the underlying database, something that may be possible if a single physical database is used and we can use a technique like subquery factoring; more likely though it will require the BI Server to issue two or more physical SQL statements and then bring the results back together again using a FullOuterStitchJoin.
-------------------- Execution plan:
RqList <<7829>> [for database 0:0,0]
D1.c1 as c1 [for database 0:0,0],
D1.c2 as c2 [for database 0:0,0],
D1.c3 as c3 [for database 0:0,0],
D1.c4 as c4 [for database 3023:4210,44]
Child Nodes (RqJoinSpec): <<7842>> [for database 0:0,0]
(
RqList <<7809>> [for database 0:0,0]
D1.c1 as c1 [for database 0:0,0],
D1.c2 as c2 [for database 0:0,0],
D1.c3 as c3 [for database 0:0,0],
D1.c4 as c4 [for database 3023:4210,44],
D1.c5 as c5 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<7824>> [for database 0:0,0]
(
RqBreakFilter <<7808>>[1,2,5] [for database 0:0,0]
RqList <<7604>> [for database 0:0,0]
case when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end as c1 GB [for database 0:0,0],
case when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end as c2 GB [for database 0:0,0],
D903.c5 as c3 GB [for database 0:0,0],
D903.c6 as c4 GB [for database 3023:4210,44],
case when D903.c7 is not null then D903.c7 when D903.c8 is not null then D903.c8 end as c5 GB [for database 0:0,0]
Child Nodes (RqJoinSpec): <<7844>> [for database 0:0,0]
(
RqList <<7915>> [for database 0:0,0]
D901.c1 as c1 [for database 0:0,0],
D902.c1 as c2 [for database 3023:4210,44],
D902.c2 as c3 [for database 3023:4210,44],
D901.c2 as c4 [for database 0:0,0],
D901.c3 as c5 GB [for database 0:0,0],
D902.c3 as c6 GB [for database 3023:4210,44],
D901.c4 as c7 [for database 0:0,0],
D902.c4 as c8 [for database 3023:4210,44]
Child Nodes (RqJoinSpec): <<7918>> [for database 0:0,0]
(
RqList <<7851>> [for database 0:0,0]
D1.c2 as c1 [for database 0:0,0],
D1.c3 as c2 [for database 0:0,0],
D1.c1 as c3 GB [for database 0:0,0],
D1.c4 as c4 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<7854>> [for database 0:0,0]
(
RqBreakFilter <<7687>>[2,3] [for database 0:0,0]
RqList <<8040>> [for database 0:0,0]
D1.c1 as c1 [for database 0:0,0],
D1.c2 as c2 [for database 0:0,0],
D1.c3 as c3 [for database 0:0,0],
D1.c4 as c4 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<8058>> [for database 0:0,0]
(
RqList <<7972>> [for database 3023:4483:Quotas,2]
sum(QUANTITY_QUOTAS.QUOTA by [ CATEGORY.CATEGORY, MONTHS.MONTH_MON_YYYY] ) as c1 [for database 3023:4483,2],
MONTHS.MONTH_MON_YYYY as c2 [for database 3023:4483,2],
CATEGORY.CATEGORY as c3 [for database 3023:4483,2],
MONTHS.MONTH_YYYYMM as c4 [for database 3023:4483,2]
Child Nodes (RqJoinSpec): <<7682>> [for database 3023:4483:Quotas,2]
CATEGORY T4486
MONTHS T4488
QUANTITY_QUOTAS T4492
DetailFilter: CATEGORY.CATEGORY = QUANTITY_QUOTAS.CATEGORY and MONTHS.MONTH_YYYYMM = QUANTITY_QUOTAS.MONTH_YYYYMM [for database 0:0]
GroupBy: [ CATEGORY.CATEGORY, MONTHS.MONTH_YYYYMM, MONTHS.MONTH_MON_YYYY] [for database 3023:4483,2]
) as D1
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c1 asc, c2 asc [for database 0:0,0]
) as D901 FullOuterStitchJoin <<7800>> On D901.c1 =NullsEqual D902.c1 and D901.c2 =NullsEqual D902.c2; actual join vectors: [ 0 1 ] = [ 0 1 ]
(
RqList <<7880>> [for database 3023:4210:orcl4,44]
D2.c2 as c1 [for database 3023:4210,44],
D2.c3 as c2 [for database 3023:4210,44],
D2.c1 as c3 GB [for database 3023:4210,44],
D2.c4 as c4 [for database 3023:4210,44]
Child Nodes (RqJoinSpec): <<7883>> [for database 3023:4210:orcl4,44]
(
RqBreakFilter <<7760>>[2,3] [for database 3023:4210:orcl4,44]
RqList <<7989>> [for database 3023:4210:orcl4,44]
sum(ITEMS.QUANTITY by [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY] ) as c1 [for database 3023:4210,44],
TIMES.MONTH_MON_YYYY as c2 [for database 3023:4210,44],
PRODUCT.CATEGORY as c3 [for database 3023:4210,44],
TIMES.MONTH_YYYYMM as c4 [for database 3023:4210,44]
Child Nodes (RqJoinSpec): <<7755>> [for database 3023:4210:orcl4,44]
PRODUCT T4256
TIMES T4264
ITEMS T4239
ORDERS T4248
DetailFilter: ITEMS.ORDID = ORDERS.ORDID and ITEMS.PRODID = PRODUCT.PRODID and ORDERS.ORDERDATE = TIMES.DAY_ID [for database 0:0]
GroupBy: [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY, TIMES.MONTH_YYYYMM] [for database 3023:4210,44]
) as D2
OrderBy: c1 asc, c2 asc [for database 3023:4210,44]
) as D902
) as D903
OrderBy: c1, c2, c5 [for database 0:0,0]
) as D1
OrderBy: c5 asc, c2 asc, c4 asc [for database 0:0,0]
) as D1
Again, notice the FullOuterStitchJoin in the execution plan – this indicates that facts (as opposed to facts and dimensions) are being joined together.
This in turn leads to two separate SQL statements. The one against the “orcl” database is more complex because the results then need to be mapped to the aggregation level that the second source, “quotas”, comes in at:
-------------------- Sending query to database named Quotas (id: <<7972>>):
select sum(T4492."QUOTA") as c1,
T4488."MONTH_MON_YYYY" as c2,
T4486."CATEGORY" as c3,
T4488."MONTH_YYYYMM" as c4
from
"CATEGORY" T4486,
"MONTHS" T4488,
"QUANTITY_QUOTAS" T4492
where ( T4486."CATEGORY" = T4492."CATEGORY" and T4488."MONTH_YYYYMM" = T4492."MONTH_YYYYMM" )
group by T4486."CATEGORY", T4488."MONTH_YYYYMM", T4488."MONTH_MON_YYYY"
+++Administrator:2b0000:2b000a:----2010/02/24 17:18:51
-------------------- Sending query to database named orcl4 (id: <<7880>>):
select D2.c2 as c1,
D2.c3 as c2,
D2.c1 as c3,
D2.c4 as c4
from
(select D1.c1 as c1,
D1.c2 as c2,
D1.c3 as c3,
D1.c4 as c4
from
(select sum(T4239.QUANTITY) as c1,
T4264.MONTH_MON_YYYY as c2,
T4256.CATEGORY as c3,
T4264.MONTH_YYYYMM as c4,
ROW_NUMBER() OVER (PARTITION BY T4256.CATEGORY, T4264.MONTH_MON_YYYY ORDER BY T4256.CATEGORY ASC, T4264.MONTH_MON_YYYY ASC) as c5
from
PRODUCT T4256,
TIMES T4264,
ITEMS T4239,
ORDERS T4248
where ( T4239.ORDID = T4248.ORDID and T4239.PRODID = T4256.PRODID and T4248.ORDERDATE = T4264.DAY_ID )
group by T4256.CATEGORY, T4264.MONTH_MON_YYYY, T4264.MONTH_YYYYMM
) D1
where ( D1.c5 = 1 )
) D2
order by c1, c2
So, to summarize things so far:
- Where possible, the BI Server will try and generate a single SQL statement to resolve a request
- And if possible, any joins that are required between tables will be pushed down to the database
- If table data sources are located on separate physical databases, the BI Server will request the individual data source data blocks, and then join the results together in-memory using an inner, left outer, right outer or full outer join as appropriate
- If facts (or measures within a fact) are being joined together, the BI Server will need to generate one logical query per logical table source, and bring the data together with a full outer stitch join
- As mentioned above, if it’s possible to do this stitch join at the database level (using, for example, a WITH clause), it’ll do so
- Otherwise the BI Server will generate separate SQL statements and join the data together in-memory
When an in-memory BI Server join happens between two tables, it will bring back both sets of data from the two (or more) table sources and then perform a sort-merge join to bring the data together. If possible, it will push the sort back to the underlying database and just do the “merge” part of the join, and it’ll in all likelihood page some of the temporary data to TMP files in $ORACLEBIDATA/tmp depending on the load on the server, available memory and the number of concurrent queries that it is running. The NQSConfig.INI BI Server parameter VIRTUAL_TABLE_PAGE_SIZE determines the point at which temporary data is paged to disk, and on a Unix server you can experiment with increasing it from its default setting if you have lots of unused memory available (the docs suggest that this will probably not have much of a positive effect, though).
Fragmented Data Sources
Another variation on a join that the BI Server can do is a “union” between two queries. This is most common when you have fragmented data sources, such as the example below where part of the data in the sales table comes from one table, and part from another.

In this case, the logical execution plan will contain RqUnionAll between the inner RqList request lists, to show that the BI Server knows it needs to union all the two queries.
-------------------- Execution plan:
RqList <<7569>> [for database 3023:6594:orcl7,44]
D3.c2 as c1 GB [for database 3023:6594,44],
sum(D3.c3 by [ D3.c2] ) as c2 GB [for database 3023:6594,44]
Child Nodes (RqJoinSpec): <<7695>> [for database 3023:6594:orcl7,44]
(
RqList <<7613>> [for database 3023:6594:orcl7,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44],
SALES_UPTO_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44]
Child Nodes (RqJoinSpec): <<7617>> [for database 3023:6594:orcl7,44]
PRODUCTS T6596
SALES T6629
DetailFilter: PRODUCTS.PROD_ID = SALES_UPTO_2003.PROD_ID [for database 0:0]
RqUnion All <<7690>> [for database 3023:6594:orcl7,44]
RqList <<7668>> [for database 3023:6594:orcl7,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44],
SALES_BEYOND_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44]
Child Nodes (RqJoinSpec): <<7672>> [for database 3023:6594:orcl7,44]
PRODUCTS T6596
SALES T6637
DetailFilter: PRODUCTS.PROD_ID = SALES_BEYOND_2003.PROD_ID [for database 0:0]
) as D3
GroupBy: [ D3.c2] [for database 3023:6594,44]
OrderBy: c1 asc [for database 3023:6594,44]
Then, depending on whether the BI Server can resolve this using a single query or multiple queries against separate data source, either a single SQL statement like the one below will be issued, or separate statements will be issued and the BI Server will do the union all in memory.
select D3.c2 as c1,
sum(D3.c3) as c2
from
((select T6596.PROD_SUBCATEGORY_DESC as c2,
T6629.AMOUNT_SOLD as c3
from
PRODUCTS T6596,
SALES T6629 /* SALES_UPTO_2003 */
where ( T6596.PROD_ID = T6629.PROD_ID )
union all
select T6596.PROD_SUBCATEGORY_DESC as c2,
T6637.AMOUNT_SOLD as c3
from
PRODUCTS T6596,
SALES T6637 /* SALES_BEYOND_2003 */
where ( T6596.PROD_ID = T6637.PROD_ID ) )
) D3
group by D3.c2
order by c1
Driving Tables (Parameterized Nested Loop Joins)
I mentioned in the paragraph above that BI Server joins are typically done using the sort-merge algorithm. One variation on this though is when you set one of the two tables in a business model and mapping logical join to be a driving table, typically because you are federating fact and dimension tables and one table is much smaller than the other, as shown in the screenshot below.

The first thing to understand with driving tables is that they are regarded as a “hint” by the BI Server, and the BI Server may well choose to ignore the setting if it makes more sense to perform the join as normal (presumably, when both tables are relatively small). If the driving table instruction is followed, though, the BI Server will always do the join in-memory, even if both tables come from logical table sources pointing to the same physical database. In the execution plan shown below, you can see the InnerJoin (left drive) that indicates a parameterized nested loop join (PNLJ) will be required, and as the name suggests the BI Server will perform a nested loop join rather than the sort-merge join that it usually uses to join tables together.
-------------------- Execution plan:
RqBreakFilter <<8705>>[1] [for database 0:0,0]
RqList <<8972>> [for database 0:0,0]
D1.c2 as c1 [for database 3023:2500,44],
sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<8984>> [for database 0:0,0]
(
RqList <<8463>> [for database 0:0,0]
D901.c1 as c2 GB [for database 3023:2500,44],
D902.c2 as c3 [for database 3023:5035,44],
D902.c3 as c5 [for database 3023:5035,44]
Child Nodes (RqJoinSpec): <<8707>> [for database 0:0,0]
(
RqList <<8757>> [for database 3023:2500:orcl2,44]
PRODUCTS.PROD_NAME as c1 GB [for database 3023:2500,44],
PRODUCTS.PROD_ID as c2 [for database 3023:2500,44]
Child Nodes (RqJoinSpec): <<8760>> [for database 3023:2500:orcl2,44]
PRODUCTS T2502
DetailFilter: PRODUCTS.PROD_NAME = '128MB Memory Card' or PRODUCTS.PROD_NAME = '3 1/2" Bulk diskettes, Box of 100' or PRODUCTS.PROD_NAME = '5MP Telephoto Digital Camera' or PRODUCTS.PROD_NAME = '64MB Memory Card' or PRODUCTS.PROD_NAME = 'Deluxe Mouse' or PRODUCTS.PROD_NAME = 'Envoy Ambassador' or PRODUCTS.PROD_NAME = 'Envoy External 8X CD-ROM' or PRODUCTS.PROD_NAME = 'Martial Arts Champions' or PRODUCTS.PROD_NAME = 'Model A3827H Black Image Cartridge' or PRODUCTS.PROD_NAME = 'Model C93822D Wireless Phone Battery' or PRODUCTS.PROD_NAME = 'Model CD13272 Tricolor Ink Cartridge' or PRODUCTS.PROD_NAME = 'PCMCIA modem/fax 28800 baud' or PRODUCTS.PROD_NAME = 'SIMM- 16MB PCMCIAII card' or PRODUCTS.PROD_NAME = 'Smash up Boxing' or PRODUCTS.PROD_NAME = 'Unix/Windows 1-user pack' [for database 0:0]
OrderBy: c2 asc [for database 3023:2500,44]
) as D901
InnerJoin (left drive) <<8806>> On D901.c2 = D902.c2; actual join vectors: [ 1 ] = [ 0 ]
(
RqList <<8790>> [for database 3023:5035:orcl5,44]
SALES.PROD_ID as c2 [for database 3023:5035,44],
sum(SALES.AMOUNT_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:5035,44]
Child Nodes (RqJoinSpec): <<8793>> [for database 3023:5035:orcl5,44]
SALES T5126
DetailFilter: SALES.PROD_ID = ?1 or SALES.PROD_ID = ?2 or SALES.PROD_ID = ?3 or SALES.PROD_ID = ?4 or SALES.PROD_ID = ?5 or SALES.PROD_ID = ?6 or SALES.PROD_ID = ?7 or SALES.PROD_ID = ?8 or SALES.PROD_ID = ?9 or SALES.PROD_ID = ?10 or SALES.PROD_ID = ?11 or SALES.PROD_ID = ?12 or SALES.PROD_ID = ?13 or SALES.PROD_ID = ?14 or SALES.PROD_ID = ?15 or SALES.PROD_ID = ?16 or SALES.PROD_ID = ?17 or SALES.PROD_ID = ?18 or SALES.PROD_ID = ?19 or SALES.PROD_ID = ?20 [for database 0:0]
GroupBy: [ SALES.PROD_ID] [for database 3023:5035,44]
OrderBy: c2 asc [for database 3023:5035,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c1 asc [for database 0:0,0]
Then then leads to the following parameterized SQL statements being issued, with the first statement representing the “driving” query, and the second the “probing” one against the larger table.
-------------------- Sending query to database named orcl2 (id: <<8757>>):
select T2502.PROD_NAME as c1,
T2502.PROD_ID as c2
from
PRODUCTS T2502
where ( T2502.PROD_NAME in ('128MB Memory Card', '3 1/2" Bulk diskettes, Box of 100', '5MP Telephoto Digital Camera', '64MB Memory Card', 'Deluxe Mouse', 'Envoy Ambassador', 'Envoy External 8X CD-ROM', 'Martial Arts Champions', 'Model A3827H Black Image Cartridge', 'Model C93822D Wireless Phone Battery', 'Model CD13272 Tricolor Ink Cartridge', 'PCMCIA modem/fax 28800 baud', 'SIMM- 16MB PCMCIAII card', 'Smash up Boxing', 'Unix/Windows 1-user pack') )
order by c2
+++Administrator:2c0000:2c000a:----2010/02/24 21:06:47
-------------------- Sending query to database named orcl5 (id: <<8790>>):
select T5126.PROD_ID as c2,
sum(T5126.AMOUNT_SOLD) as c3
from
SALES T5126
where ( T5126.PROD_ID in (:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, :PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, :PARAM18, :PARAM19, :PARAM20) )
group by T5126.PROD_ID
order by c2
In reality you rarely see driving table joins being used as there are much better solutions to bringing together small and large tables together – the main one being to co-locate the tables and then push the join down to the database, rather than bring both datasets together and have the BI Server join them in memory instead (this also applies to a lesser degree to all BI Server joins). But this could be a useful “quick fix” until such time as you can co-locate the data, and its useful to remember that these types of joins are always done by the BI Server due to the need to iterate through drive/probe operations.
Persist Connnection Pools
One final variation on BI Server execution plans and join types is when you set up a “persist connection pool”. Persist connection pools are typically used in two scenarios; firstly, where Oracle/Siebel Marketing is being used, and secondly, where the underlying physical database doesn’t handle large numbers of values in an IN-list. In this case, you can set up a second connection pool within a physical database and specify it as the persist connection pool, as shown in the screenshot below:

I’ve never encountered a persist connection pool “in the wild”, so to speak, but an example query log output from when one was used is shown below. In this instance, the first query was sent to a MS Analysis Services database, and a persist connection pool was used to materialize the in-list results into a database table which is then joined back to the ORDERS table in the final query, rather than have the BI Server do the join in-memory.
-------------------- Sending query to database named FoodMart (id: <<10980>>):
With
member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name'
set [Q] as '{{[Time].[Year].members}}'
select
{[measures].[YearAnc]} on columns,
{[Q]} on rows
from [Sales]
-------------------- Sending query to database named SQLDB_Northwind (id: CreateTable TransGateway):
CREATE TABLE TTCH5C5DEL554110000020000003 ( column1 VARCHAR2(8) )
-------------------- Sending query to database named SQLDB_Northwind (id: <<11057>>):
select distinct TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') as c1
from
Orders T1864
where ( TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') in (select column1 from TTCH5C5DEL554110000020000003) )
Conclusions
So, there you have it. The join strategy of the BI Server, as is the case with functions and calculations, is to wherever possible push them down to the underlying database. If this can’t be done, because either the database version doesn’t support features like subquery factoring, or if the data for the request is being sourced from more than one physical databas, the BI Server will do the join itself, initially in-memory but usually with temporary data being paged to disk.
There are two main types of BI Server join; regular (inner, left outer, right outer and fullouter) joins for bringing together fact and dimension tables; and full outer stitch joins, for bringing together facts and measures. There are also variations for handling joins from very small tables to very large tables (driving tables, or parameterized nested loop joins), or when the physical database doesn’t support large in-lists, however these issues are usually better handled by co-locating data or upgrading the database.
Finally, even though the BI Server is pretty clever at doing these types of joins, you’re usually better trying to invest your time in physically bringing your data together into a data mart or data warehouse than spending too much time fine-tuning these joins, though a knowledge of how they work (and how to read a level 5 execution plan) can be useful if you have to understand, or tune, an existing system in-place. Of course the level 5 execution plan doesn’t really tell you anything you couldn’t determine by looking at the design of the RPD – there’s nothing that goes on beyond this that might change the execution plan for a certain set of data, unlike the Oracle database which changes the plan from database to database depending on the distribution and nature of the data – but its interesting to get a peek into the workings of the BI Server Navigator module.


