While some people use Kiba ETL in standalone fashion, it’s getting more and more common to see Kiba added to an existing application where you need to process data in a way or another.
See this StackOverflow question:
I have to load data into a Spree application. Spree makes use of Rails Engines. All examples use pretty print or CSV destinations, but I want to use Spree models in the destination, eg.
SpreeModel.create!(row)
To solve this, we must make sure that the Kiba script has access to the Rails environment, so that it can quickly refer to existing models, leverage the database.yml
connection, etc.
There are 2 ways to achieve this.
Loading your Rails environment from Kiba scripts
Before anything, add kiba
to your Gemfile and run bundle install
.
Then let’s imagine you place your ETL scripts under an etl
folder:
To access the Rails environment from this script, you just have to add this line at the top of it:
Once you have that you can simply use this to launch your script:
The whole Rails environment will be loaded just before the rest of the script is parsed then run.
Calling Kiba programmatically from a Rake task
Another possibility is to call Kiba programmatically, for instance from a Rake task.
Some Kiba users have also started calling it from inside Resque or Sidekiq jobs (so: not triggered by either the Kiba CLI or a Rake task). It works but the support is a bit lacking in that area (e.g. getting structured output out of the command), so future work is required to improve this!
To call Kiba from a Rake task, you must add kiba
to your Gemfile first and run bundle install
.
Then create your Rake task to programmatically invoke your Kiba script:
In this case also, your ETL script definition will have access to all your Rails models and other classes, so you can work at importing data.
Now you can more easily add data-processing tasks from the comfy confort of your Rails application!
Thank you for sharing this article!