Talend Open Studio is an open source Data Integration/ETL tool that allows creating complex jobs through an easy-to-use graphical interface based on the Eclipse platform. It allows for rapid development through its hundreds of prebuilt components to source and target systems.
However, as great as Talend is, there is one caveat; the FREE version does not come with version control and a shared job repository integrated into the product.
Problem
This means that if one developer created a job design, and a second developer needs to make modifications to it in the future to handle new requirements, that second developer would not able to do it because that job is in another’s developer workstation.
Prerequisites
- You will need to have a Subversion Repository (SVN Server).
- You will need a Subversion client, TortoiseSVN. This will allow you to commit code (or any other type of file you want to version) and checkout code from Subversion server. I recommend that everybody is on the same version, Iuse the latest version at the moment, which is 1.7.5
- In order to develop new and or maintain existing Talend jobs you will need Talend Open Studio. You can also download it from Sourceforge.net (no questions asked). We are using version 5.0.2. I would also recommend that everybody is on the same version.
- In order to run jobs from the command line (like they are going to run once deployed to the server) you are going to need the Java JDK 1.5 or greater on your machine. Instructions on how to set up java on your machine can be found here
Solution
Here, I will show you how to integrate Talend with SVN so that an entire team can work on the same jobs code base.
- Checking Out a Talend Project from a Shared Job Repository
- I Cannot View all my Jobs, Contexts, and Metadata in my TOS!!!
- Creating a Shared Job Repository
- Checking In New Changes to a Shared Job Repository
- Updating your local workspace
- Resolving File Conflicts
Checking Out a Talend Project from a Shared Job Repository
- If you have TOS open, close it.
- Go to your Talend workspace. In my case(C:\Talend5.0.1\TOS_DI-Win32-r74687-V5.0.1\workspace)
- Right click with your mouse->SVN Checkout…
- Put the SVN Server URL of your Repository:
- The Talend Project from the server should be in your workspace now:
- Open TOS. You should see the project listed as follows:
- All the Jobs associated to that Project should be visible. If not, read the next section of the tutorial.
I cannot view my Jobs, Contexts, and Metadata in my TOS!!!
After checking out an entire Project from Subversion or doing an SVN Update on an existing Project in your workspace it is possible that your GUI has not picked up the changes. You can do two things to take care of that:
- Do a Refresh of the Repository left panel
- Do an Import Items. From TOS, Right click on Job Designs->Import Items
Creating a Shared Job Repository
- Create a Talend Project. This Talend project will house multiple Jobs. (Remember not to confuse Project with Job)
- Navigate to your Talend workspace. Right clickon the project->TortoiseSVN->Import…
- The following prompt will come up. Enter your repository location and a message
- You should see the following message:
- Type your SVN Repo URL on the browser and verify that contents are there
Checking In New Changes to a Shared Job Repo
Rule of Thumb: ALWAYS, ALWAYS, ALWAYS before checking in anything, do an SVN Update first and resolve any existing conflicts. Read the next section for how to resolve conflicts.
If you do not have any conflicts to resolve, you are ready to commit your changes. Do as follows:
- Right Click on the Project through your C:/->TortoiseSVN->Check For Modifications
Note: You can skip this step and go to Step 2.
This will look for modifications locally.
- Right click on your project in your workspace->SVN Commit…
- Click’OK’. Now other developers should be able to do an SVN Update and work off of the latest changes.
Updating your local Workspace with latest stuff from the Job Repo
- Right click your Project folder in C:\ drive->SVN Update
In this case there was nothing new in the server, so nothing was updated. If you would have got conflicts, please read the next section.
Resolving File Conflicts
When doing shared development with Version Control Systems (Subversion, Git, CVS, etc…) it could be that another person has edited a common file that you have also edited. What we want here is to keep the other developer’s change as well as ours. We have to resolve a conflict. This can be done in 5 easy steps.
- Go to your Project folder (C:/path_to_your_talend_workspace_project_folder) and do Right Click/SVN Update. This will update any local files with newer ones from the server
- If it encounters a file that you have also edited, it will raise a conflict. YOU MUST RESOLVE IT!!
- Right Click on the Conflicted file and select Edit Conflicts. The TortoiseMerge editor will come up:
- To merge files do as follows:
- Finally, you must mark the file as resolved
What if I want to accept their file and Override mine?
Do Right Click on Conflicted file/Resolve conflict using ‘theirs’
or
Do Right Click on Conflicted file/Resolve conflict using ‘mine
It is a little bit cumbersome, but once you get the hang of it, it is not that bad. It is unfortunate that they do not release the Talend Open Studio version with hooks to Subversion and Git right out of the box. I doubt, that organizations are strictly buying their product Talend Integration Studio based mostly on the Shared Repository.