How to run Kiba ETL in a Rails environment?

September 26, 2015

While some people use Kiba ETL in standalone fashion, it’s getting more and more common to see Kiba added to an existing application where you need to process data in a way or another.

See this StackOverflow question:

I have to load data into a Spree application. Spree makes use of Rails Engines. All examples use pretty print or CSV destinations, but I want to use Spree models in the destination, eg. SpreeModel.create!(row)

To solve this, we must make sure that the Kiba script has access to the Rails environment, so that it can quickly refer to existing models, leverage the database.yml connection, etc.

There are 2 ways to achieve this.

Loading your Rails environment from Kiba scripts

Before anything, add kiba to your Gemfile and run bundle install.

Then let’s imagine you place your ETL scripts under an etl folder:

$ cd my-rails-app
$ mkdir etl && cd etl
$ touch my-script.etl

To access the Rails environment from this script, you just have to add this line at the top of it:

# put this at the top of your script
require_relative '../config/environment'

# then declare your Kiba ETL script

Once you have that you can simply use this to launch your script:

$ bundle exec kiba etl/my-script.etl
# or in production:
$ RAILS_ENV=production bundle exec kiba etl/my-script.etl

The whole Rails environment will be loaded just before the rest of the script is parsed then run.

Calling Kiba programmatically from a Rake task

Another possibility is to call Kiba programmatically, for instance from a Rake task.

Some Kiba users have also started calling it from inside Resque or Sidekiq jobs (so: not triggered by either the Kiba CLI or a Rake task). It works but the support is a bit lacking in that area (e.g. getting structured output out of the command), so future work is required to improve this!

To call Kiba from a Rake task, you must add kiba to your Gemfile first and run bundle install.

Then create your Rake task to programmatically invoke your Kiba script:

task :etl => :environment do
  etl_filename = 'etl/test.etl'
  script_content = IO.read(etl_filename)
  # pass etl_filename to line numbers on errors
  job_definition = Kiba.parse(script_content, etl_filename)
  Kiba.run(job_definition)
end

In this case also, your ETL script definition will have access to all your Rails models and other classes, so you can work at importing data.

Now you can more easily add data-processing tasks from the comfy confort of your Rails application!