Skip to content

Terraform: Resource not found

Here's a few things I learned and did when I encountered the very verbose "Resource not found" error from Terraform.

Debug your Infrastructure as Code

More logs?

This is my obvious choice or go-to. Terraform comes with different log levels though it will say itself that every level but TRACE is not to be trusted?

2021/03/02 09:21:33 [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility. Use TF_LOG=TRACE to see Terraform's internal logs.

FWIW, DEBUG and ERROR seem to produce okay output to narrow down problems and TRACE seems overwhelming, which is not very helpful.

Refresh, plan?

To narrow down a problem I can run terraform refresh (or import, plan) and hope for the best, but what I found incredibly valuable was adding a -target to either. This allows me to test resources one by one.

To retrieve a list of what is currently known to Terraform's state:

$ terraform state list
data.openstack_images_image_v2.centos
data.openstack_networking_network_v2.public_network
openstack_compute_instance_v2.jump_host
openstack_compute_keypair_v2.ssh_key
openstack_networking_network_v2.network
openstack_networking_secgroup_rule_v2.jump_host_rule
openstack_networking_secgroup_rule_v2.monitoring_rule
openstack_networking_secgroup_v2.jump_group
openstack_networking_subnet_v2.monitoring

Which seems accurate in my case.

Then I proceeded to go through each of them to find out what I may or may not know:

$ terraform plan -target openstack_compute_keypair_v2.ssh_key
...

Of course, it only failed on the one using literally everything else:

$ terraform plan -target openstack_compute_instance_v2.jump_host
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.openstack_networking_network_v2.public_network: Refreshing state... [id=foo]
data.openstack_images_image_v2.centos: Refreshing state... [id=foo]
openstack_compute_keypair_v2.ssh_key: Refreshing state... [id=foo]
openstack_networking_network_v2.network: Refreshing state... [id=foo]
openstack_networking_subnet_v2.monitoring: Refreshing state... [id=foo]
openstack_compute_instance_v2.jump_host: Refreshing state... [id=foo]

Error: Resource not found


Releasing state lock. This may take a few moments...

Provider

If you've read this far, you probably feel my pain. Let's take a look at the provider and that is in my case the OpenStack provider for Terraform. And this is where I wish I looked yesterday.

The OpenStack provider comes with its own log level: OS_DEBUG=1. This only works with the appropriate Terraform TF_LOG= statement (spoiler: not TF_LOG=TRACE).

This is what I started out with:

$ TF_LOG=ERROR OS_DEBUG=1 terraform plan -target openstack_compute_instance_v2.jump_host
... [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility.
  Use TF_LOG=TRACE to see Terraform's internal logs.
  ----
<...snip...>
openstack_networking_subnet_v2.monitoring: Refreshing state... [id=foo]
openstack_compute_instance_v2.jump_host: Refreshing state... [id=foo]
... [ERROR] eval: *terraform.EvalRefresh, err: Resource not found
... [ERROR] eval: *terraform.EvalSequence, err: Resource not found

Error: Resource not found


Releasing state lock. This may take a few moments...

Slightly more helpful (well, not really).

Now re-run the command with TF_LOG=DEBUG and the output will contain API calls made to OpenStack:

... [DEBUG] ..._v1.32.0: Vary: OpenStack-API-Version X-OpenStack-Nova-API-Version
... [DEBUG] ..._v1.32.0: X-Compute-Request-Id: bar
... [DEBUG] ..._v1.32.0: X-Openstack-Nova-Api-Version: 2.1
... [DEBUG] ..._v1.32.0: X-Openstack-Request-Id: bar
... [DEBUG] ..._v1.32.0: 2021/03/02 11:46:21 [DEBUG] OpenStack Response Body: {
... [DEBUG] ..._v1.32.0:   "itemNotFound": {
... [DEBUG] ..._v1.32.0:     "code": 404,
... [DEBUG] ..._v1.32.0:     "message": "Flavor foobar could not be found."
... [DEBUG] ..._v1.32.0:   }
... [DEBUG] ..._v1.32.0: }

And this concludes why my terraform plan fails: the flavour I used four months ago is no longer available.

Fin

If I ever get to it, I have to figure out why those error messages are not bubbled up. Or why TF_LOG=DEBUG doesn't invoke OS_DEBUG=1.

Thank you for reading. Have a great day!

Ansible Galaxy: Install private roles from private GitHub repositories

When I googled how to install private roles using ansible-galaxy, I found suggestions such as, "use git+https://github.com/..." or even better, "I am not sure what you're doing, but it works for me (since Ansible 2.2)".

So, since neither of these suggestions helped me and because I am unable to find documentation with obvious examples, here is how you achieve this.

Assuming you have your ssh key and configuration figured out, put this into requirements.yml:

---
- name: namespace.role
  src: [email protected]:my-organization/private-repository.git
  version: 1.0.0

This forces ansible-galaxy install requirements.yml to git-clone the role using your ssh key.

Prometheus: relabel your scrape_config

Prometheus labels every data point — the most well-known example of a label is (probably) instance.

Take a look at this query result (query: up{job="prometheus"}):

up{instance="127.0.0.1:9090",job="prometheus"} 1

So what does this tell me?

I queried for the "up" metric and filtered it for "prometheus" — yay. The "1" says, my service is alive. So far so gut.

Readability

Since we are in the process of running a few Prometheus servers (in federation), each of those metrics will report back with instance="127.0.0.1:9090" (along with other labels of course).

While this works, I'm not a computer. If the "instance" reported an FQDN or some other readable name, it would make any dashboard or alert more approachable. Or readable, if you will.

The instance label

instance is a standard field used in various Grafana dashboards out there. Dashboards often use the value in instance to provide you with a dropdown list of (well) instances (or nodes) to select from.

To not end up with a dropdown full of 127.0.0.1:9000, here is a snippet on how to work with labels to make life a little easier.

Rewriting labels

Consider the following scrape_config:

- job_name: "prometheus"
  metrics_path: "/metrics"
  static_configs:
  - targets:
    - "127.0.0.1:9090"

It produces the result above.

Now, extend it slightly to include a name and relabel the instance field with it:

- job_name: "prometheus"
  metrics_path: "/metrics"
  relabel_configs:
    - source_labels: [name]
      target_label: instance
  static_configs:
  - targets:
    - "127.0.0.1:9090"
    labels:
      name: my-prometheus.example.org

Query again:

up{instance="my-prometheus.example.org",job="prometheus",name="my-prometheus.example.org"} 1

Now "instance" is set to something I can grok by glancing over it. Which makes me happy.

Fin

Thanks for following along!

Speeding up composer on AWS OpsWorks

At EasyBib, we're heavy users of composer and AWS OpsWorks. Since we recently moved a lot of our applications to a continuous deployment model, the benefits of speeding up the deployment process (~4-5 minutes) became more obvious.

Composer install

Whenever we run composer install, there are a lot of rount-trips between the server, our satis and Github (or Amazon S3).

One of my first ideas was to to get around a continous reinstall by symlinking the vendor directory between releases. This doesn't work consistently for two reasons:

What's a release?

OpsWorks, or Chef in particular, calls deployed code releases.

A release is the checkout/clone/download of your application and lives in /srv/www:

srv/
└── www
    └── my_app
        ├── current -> /srv/www/my_app/releases/20131008134950
        ├── releases
        └── shared

The releases directory, contains your application code and the latest is always symlinked into place using current.

Atomic deploys

  1. When deploying code, deploys need to be atomic. We don't want to break whatever is currently online — not even for a second or a fraction of it.
  2. We have to be able to roll-back deployments.

Symlinking the vendor directory between releases doesn't work because it would break existing code (because who knows how long the composer install takes or a restart of the application server) and it would require an additional safety net in place to be able to rollback a failed deployed successfully.

Ruby & Chef to the rescue

Whenever a deployment is run, Chef allows us to hook into the process using deploy hooks. These hooks are documented for OpsWorks as well.

The available hooks are:

  • before migrate
  • before symlink (!)
  • before restart
  • after restart

In order to use them, create a deploy directory in your application and put a couple ruby files in there:

  • before_migrate.rb
  • before_symlink.rb
  • before_restart.rb
  • after_restart.rb

If you're a little in the know about Rails, these hooks will look familiar.

The migration hook is probably used to run database migrations — something we don't do and probably will never do. ;-) But be assured: at this point in time the checkout of your applications is complete: or in other words, the code is on the instance.

The symlink hook is what we use to run composer install to get the web app up to speed, we'll take a closer look in a second.

Before restart is a hook used to run commands before your application server reloads — for example something like purging cache directories, whatever you want to get in order before /etc/init.d/php-fpm reload is executed to revive APC.

And last but not least, after restart — used on our applications to send an annotation to NewRelic — that we successfully deployed a new release.

Before symlink

So up until now, the before_symlink.rb looked like this:

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Note: release_path is a variable automatically available/populated in the scope of this script. If you need more, your node attributes are available as well.

Anyway, after reading Scaling Symfony2 with AWS OpsWorks, it inspired me to attempt to copy my vendors around. But instead of doing it in a recipe, I decided to use one of the available deploy hooks for this:

app_current = ::File.expand_path("#{release_path}/../../current")
vendor_dir  = "#{app_current}/vendor"

deploy_user  = "www-data"
deploy_group = "www-data"

release_vendor = "#{release_path}/vendor"

::FileUtils.cp_r vendor_dir, release_vendor if ::File.exists?(vendor_dir)
::FileUtils.chown_R deploy_user, deploy_group, release_vendor if ::File.exists?(release_vendor)

composer_command = "/usr/local/bin/php"
composer_command << " #{release_path}/composer.phar"
composer_command << " --no-dev"
composer_command << " --prefer-source"
composer_command << " --optimize-autoloader"
composer_command << " install"

run "cd #{release_path} && #{composer_command}"

Step by step:

  • copy the current release's vendor to the new release (if it exists)
  • chown all files to the webserver (if the new vendor exists)

This allows the deploy hook to complete, even if we're on a fresh instance.

Benchmarks?

Effectively, this cut deployment from 4-5 minutes, to 2-3 minutes. With tailwind, a 50% improvement.

FIN

That's all. Happy deploying!

Bento and VirtualBox

Last week I blogged some Vagrant tips and pretty much jinxed the run I had in the past months.

Here's how:

  • I decided to upgrade to Vagrant 1.1, which broke bento: the current bento master is incompatible with Vagrant 1.1. But selecting the right rbenv env and installing the latest available Vagrant gem (inside the rbenv environment) fixed it.

  • My base box build, but for some reason, the guest addition setup broke and while it worked on Mac OSX, it broke the image completely on Ubuntu. Don't ask me why.

Especially the last bit reminded me to share another small tip, or work-around.

Fixing up base boxes

So every once in a while something doesn't work as expected. In my case, the guest additions were installed but not loaded and also failed to load when I started a VM. This in itself wasn't so bad, but it broke the vboxfs shares and while there may be cases where you don't need that (e.g. for a database VM), a VM without your code mounted into it is pretty useless.

The fix wasn't too hard:

  • I created a blank Vagrantfile (no recipes, just a simple box definition).
  • vagrant up and wait for the box to fail.
  • vagrant ssh to enter the box, or start with the GUI option and login through it.
  • execute sudo /etc/init.d/vboxadd setup

Then, exit the VM and execute the following: vagrant package vm_name --output box_name.box.

Import the box again with vagrant box add etc., update your Vagrantfile and test — then distribute.

Thoughs

Let me use this to reiterate on how critical a fixed set of versions are.

VirtualBox 4.2.x and bento (or veewee) seem to be incompatible somewhere and even though an error doesn't surface while the box is build or validated, it's still happening. It's paramount that there's always a rollback of some kind before you end up breaking too many things and stall your team for long.

Fin

That would be all for today.