Monday, June 26, 2017

Debugging Sparkl applications Pentaho JCR Repository Synchronizer real world example

Debugging Sparkl applications Pentaho JCR Repository Synchronizer real world example



Its undeniable that I really love the Sparkl concept - allows for very fast development of applications / extensions using technologies that we are very familiar with. And Im amazed by the quality of contributions that were recently done using it. BTable and AAAR are great examples.

Ill now show an example of how to debug and fix bugs in Sparkl apps. Youll see that its much, much easier than doing it at a java level where development knowledge is required

A real use case - Fixing a PRS bug



A user recently reported a bug to PRS (Pentaho Repository Synchronizer), saying it didnt work for him. After some investigation, we found out that the problem was caused by where his BA server was installed - something like "c:Program Files (64)pentaho..". Yeah, windows.... Still, a bug!

By changing my solution file to a file with weird chars, I was able to replicate the bug. This is what I see on the screen:


Clearly theres an error on the data that feeds that table. Lets figure out whats wrong then.

Identifying the problem in Sparkl


Starting from Sparkl, lets see what query feeds that table. Start by editing the Pentaho Repository Synchronizer project



In the elements, select the main dashboard and edit it


 By doing that, well go to CDE. Switching to the components, we can select the jcr2file table


We can see its using the diffTableQuery. In the datasources we can check which endpoint it refers too


So the culprit is the previewDifferences endpoint. Back to Sparkl, we can find that endpoint in the list


By running that endpoint (on this case the default values of the parameters work out of the box, on other cases we may need to explicitly pass them), well see the returning json:



This is clearly wrong. Its returning the full path to my file system. The paths should be relative to the repository (the stuff under repositorySynchronizer). So somewhere on the logic that is not being properly treated

Identifying the problem in PDI


All the backend of Sparkl applications is built using kettle. Theres physically a file called previewDifferences.ktr that we can simply open in PDI






Sparkl endpoints run the associated transformation and outputs the resultset thats on the step called OUTPUT. Theres a subtransformation there, we can inspect and execute it.

On the specific case of the PRS, in order to run the transformations we need to add some extra libs to the kettle lib dir, specifically libpensol.jar and libpenson.properties, that can be found in solution/system/repositorySynchronizer/resources/lib and some bi-platform jars (core, api and repository) that are in the bi server WEB-INF/lib.

When we run the list_and_compare transformation we see that we get the same output that we obtained through Sparkl:


This means that somewhere before theres something failing in doing the matches between what is on JCR and whats on the file system. After some previewing / investigation we see that the filename_without_location_dest isnt actually removing the main location!



Thats a UDJE snippet. Theres something wrong with it. The code is this:


It uses the following expression:

org.pentaho.di.core.vfs.KettleVFS.getInstance().getFileObject(filename).toString().replaceAll("\","/").replaceAll("^"+elementPath.replaceAll("\","/"),"")

Ah! It uses a regular expression! And our path has parenthesis. So this code is treating the parenthesis not as a literal character but an a regexp capture group. This is causing the problem.

Fixing the bug


Having this in mind, we can change this to a different form. I chose a more common substring:

org.pentaho.di.core.vfs.KettleVFS.getInstance().getFileObject(filename).toString().substring(elementPath.length())

By previewing this step we see that we get the expected result set - the file names without the initial repository location


Now if we go back to our dashboard, well be able to see if this actually fixed the initial problem


Et voila! Bug fixed, commited and a new version of PRS is now available for download.

I may be absolutely biased - but I do believe this is just awesome! :)





download more info

No comments:

Post a Comment