For the past few months, I’ve used Elixir’s newly-added ability to write self-containing programs via
Mix.install/2 (https://hexdocs.pm/mix/1.14.0/Mix.html#install/2) more and more, and this includes scripting needs.
Here is a quick example of
Mix.install/2 use, and you can find many more examples at https://github.com/wojtekmach/mix_install_examples:
What I love about this is that this script is self-contained: it can be executed trivially with just
elixir my_script.exs, and all the dependencies will then be installed automatically!
I find it very convenient and use it for scripting on a regular basis now.
On HTTP caching with Req
req has become my preferred Elixir HTTP client for these scripting needs:
- It relies on solid underlying basis (finch, which itself relies on mint, all solidly maintained libraries).
- It has useful defaults (JSON decoding) that you can bypass easily
- Its API allows extensibility
See more in the docs.
When munging data locally, something that comes up very often is the need to replay a data-processing code sequence repeatedly. When it involves HTTP queries, this can be cumbersomely slow.
Req includes a simple cache, based on the
if-modified-since HTTP request header.
Sadly, a lot of servers I’m querying do not handle that header well, if at all.
A Req-based solution
To solve that, no matter the tech stack, I very often use a form of permanent HTTP disk-caching. This makes my script work no matter if the remote server supports caching well or not.
Luckily Req is extensible, and allows you to register “steps” in the request processing, the response processing and also the error processing (see documentation).
It is actually quite flexible, allowing you to register options that the plugin can use etc.
Plugging my code into Req was something new to me, so I’m sharing it here. Some parts are actually extracted from the
req cache step directly (I’ve provided links to the original in those cases):
Now that it is available, you can use it this way:
What happens is:
- I create a high-level API
- I “attach” my plugin to it (the name of the method does not matter)
- This registers the option key (via register_options), a feature useful to detect mistyped option keys (pretty cool)
- The request and response “steps” are attached to the processing pipeline
- A cache filename is constructed based on the URL
- The whole
Responseis serialized on disk with
:erlang.term_to_binary(response)(which means everything is stored, body, headers & HTTP status code), and deserialized at the right moment in the pipeline (before JSON-decoding occurs, typically)
Although diving into the pipeline was a bit complicated initially, I’m fairly happy with the result, and it allows me to automatically keep leveraging all the interesting features (retries, decoding) that come with the default
And keep my local data experiments fast :-)
Thank you for sharing this article!