While some people use Kiba ETL in standalone fashion, it’s getting more and more common to see Kiba added to an existing application where you need to process data in a way or another.
See this StackOverflow question:
I have to load data into a Spree application. Spree makes use of Rails Engines. All examples use pretty print or CSV destinations, but I want to use Spree models in the destination, eg.
SpreeModel.create!(row)
To solve this, we must make sure that the Kiba script has access to the Rails environment, so that it can quickly refer to existing models, leverage the database.yml
connection, etc.
There are 2 ways to achieve this.
Loading your Rails environment from Kiba scripts
Before anything, add kiba
to your Gemfile and run bundle install
.
Then let’s imagine you place your ETL scripts under an etl
folder:
$ cd my-rails-app
$ mkdir etl && cd etl
$ touch my-script.etl
To access the Rails environment from this script, you just have to add this line at the top of it:
# put this at the top of your script
require_relative '../config/environment'
# then declare your Kiba ETL script
Once you have that you can simply use this to launch your script:
$ bundle exec kiba etl/my-script.etl
# or in production:
$ RAILS_ENV=production bundle exec kiba etl/my-script.etl
The whole Rails environment will be loaded just before the rest of the script is parsed then run.
Calling Kiba programmatically from a Rake task
Another possibility is to call Kiba programmatically, for instance from a Rake task.
Some Kiba users have also started calling it from inside Resque or Sidekiq jobs (so: not triggered by either the Kiba CLI or a Rake task). It works but the support is a bit lacking in that area (e.g. getting structured output out of the command), so future work is required to improve this!
To call Kiba from a Rake task, you must add kiba
to your Gemfile first and run bundle install
.
Then create your Rake task to programmatically invoke your Kiba script:
task :etl => :environment do
etl_filename = 'etl/test.etl'
script_content = IO.read(etl_filename)
# pass etl_filename to line numbers on errors
job_definition = Kiba.parse(script_content, etl_filename)
Kiba.run(job_definition)
end
In this case also, your ETL script definition will have access to all your Rails models and other classes, so you can work at importing data.
Now you can more easily add data-processing tasks from the comfy confort of your Rails application!
Thank you for sharing this article!