Skip to content

Terraform: Resource not found

Here's a few things I learned and did when I encountered the very verbose "Resource not found" error from Terraform.

Debug your Infrastructure as Code

More logs?

This is my obvious choice or go-to. Terraform comes with different log levels though it will say itself that every level but TRACE is not to be trusted?

2021/03/02 09:21:33 [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility. Use TF_LOG=TRACE to see Terraform's internal logs.

FWIW, DEBUG and ERROR seem to produce okay output to narrow down problems and TRACE seems overwhelming, which is not very helpful.

Refresh, plan?

To narrow down a problem I can run terraform refresh (or import, plan) and hope for the best, but what I found incredibly valuable was adding a -target to either. This allows me to test resources one by one.

To retrieve a list of what is currently known to Terraform's state:

$ terraform state list
data.openstack_images_image_v2.centos
data.openstack_networking_network_v2.public_network
openstack_compute_instance_v2.jump_host
openstack_compute_keypair_v2.ssh_key
openstack_networking_network_v2.network
openstack_networking_secgroup_rule_v2.jump_host_rule
openstack_networking_secgroup_rule_v2.monitoring_rule
openstack_networking_secgroup_v2.jump_group
openstack_networking_subnet_v2.monitoring

Which seems accurate in my case.

Then I proceeded to go through each of them to find out what I may or may not know:

$ terraform plan -target openstack_compute_keypair_v2.ssh_key
...

Of course, it only failed on the one using literally everything else:

$ terraform plan -target openstack_compute_instance_v2.jump_host
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.openstack_networking_network_v2.public_network: Refreshing state... [id=foo]
data.openstack_images_image_v2.centos: Refreshing state... [id=foo]
openstack_compute_keypair_v2.ssh_key: Refreshing state... [id=foo]
openstack_networking_network_v2.network: Refreshing state... [id=foo]
openstack_networking_subnet_v2.monitoring: Refreshing state... [id=foo]
openstack_compute_instance_v2.jump_host: Refreshing state... [id=foo]

Error: Resource not found


Releasing state lock. This may take a few moments...

Provider

If you've read this far, you probably feel my pain. Let's take a look at the provider and that is in my case the OpenStack provider for Terraform. And this is where I wish I looked yesterday.

The OpenStack provider comes with its own log level: OS_DEBUG=1. This only works with the appropriate Terraform TF_LOG= statement (spoiler: not TF_LOG=TRACE).

This is what I started out with:

$ TF_LOG=ERROR OS_DEBUG=1 terraform plan -target openstack_compute_instance_v2.jump_host
... [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility.
  Use TF_LOG=TRACE to see Terraform's internal logs.
  ----
<...snip...>
openstack_networking_subnet_v2.monitoring: Refreshing state... [id=foo]
openstack_compute_instance_v2.jump_host: Refreshing state... [id=foo]
... [ERROR] eval: *terraform.EvalRefresh, err: Resource not found
... [ERROR] eval: *terraform.EvalSequence, err: Resource not found

Error: Resource not found


Releasing state lock. This may take a few moments...

Slightly more helpful (well, not really).

Now re-run the command with TF_LOG=DEBUG and the output will contain API calls made to OpenStack:

... [DEBUG] ..._v1.32.0: Vary: OpenStack-API-Version X-OpenStack-Nova-API-Version
... [DEBUG] ..._v1.32.0: X-Compute-Request-Id: bar
... [DEBUG] ..._v1.32.0: X-Openstack-Nova-Api-Version: 2.1
... [DEBUG] ..._v1.32.0: X-Openstack-Request-Id: bar
... [DEBUG] ..._v1.32.0: 2021/03/02 11:46:21 [DEBUG] OpenStack Response Body: {
... [DEBUG] ..._v1.32.0:   "itemNotFound": {
... [DEBUG] ..._v1.32.0:     "code": 404,
... [DEBUG] ..._v1.32.0:     "message": "Flavor foobar could not be found."
... [DEBUG] ..._v1.32.0:   }
... [DEBUG] ..._v1.32.0: }

And this concludes why my terraform plan fails: the flavour I used four months ago is no longer available.

Fin

If I ever get to it, I have to figure out why those error messages are not bubbled up. Or why TF_LOG=DEBUG doesn't invoke OS_DEBUG=1.

Thank you for reading. Have a great day!

Tracking PHP errors

track_errors provides the means to catch an error message emitted from PHP. It's something I like to use during the development of various applications, or to get a handle on legacy code. Here are a few examples why!

For example

Imagine the following remote HTTP call:

$response = file_get_contents('http://example.org/');

So whenever this call fails, it will return false and also emit an error message:

Warning: file_get_contents(http://example.org):
    failed to open stream: could not connect to host in /example.php on line 2

Some people use @ to suppress this error message — an absolute no-go for reasons such as:

  • it just became impossible to know why the call failed
  • @ at runtime is an expensive operation for the PHP parser

The advanced PHP web kiddo knows to always display_errors = Off (e.g. in php.ini or through ini_set()) in order to shield the visitor of their application from these nasty messages. And maybe they even know how to log the error — somewhere.

But whenever an error is logged to a log file somewhere, it also means it's buried. Sometimes these error logs are too far away and often they get overlooked. If you happen to centralize and actually analyze your logfiles, I salute you!

So how do you use PHP's very useful and informative error message to debug this runtime error?

track_errors to the rescue.

track_errors is a PHP runtime setting.

To enable it:

; php.ini
track_errors = On 

Or:

ini_set('track_errors', 1);

And this allows you to do the following:

$response = file_get_contents('http://example.org');
if ($response === false) {
    throw new RuntimeException(
        "Could not open example.org: {$GLOBALS['php_errormsg']}"
    );
}

The last error message is always populated in the global variable $php_errormsg.

You want more?

I also recently used the same technique to implement error checking into a legacy application. I basically did the following:

// footer.php ;-)
if (isset($GLOBALS['php_errormsg'])
    && !empty($GLOBALS['php_errormsg'])
    && $user->isAdmin()) {

    echo $GLOBALS['php_errormsg'];
}

Trade-offs

As useful as this can be, there are a number of trade-offs. I suggest you use it wisely.

  • $php_errormsg is a global variable [yuck]
  • many extensions provide build in ways to catch errors
  • ini_set() calls at runtime are expensive

Fin

That's all, quick and (very) dirty.

Debugging Zend_Test

Sometimes, I have to debug unit tests and usually this is a situation I'm trying to avoid.

If I have to spend too much time debugging a test it's usually a bad test. Which usually means that it's too complex. However, with Zend_Test_PHPUnit_ControllerTestCase, it's often not the actual test, but the framework. This is not just tedious for myself, it's also not the most supportive fact when I ask my developers to write tests.

An example

The unit test fails with something like:

Failed asserting last module used <"error"> was "default".

Translated, this means the following:

  • The obvious: an error occurred.
  • The error was caught by our ErrorController.
  • Things I need to find out:
    • What error actually occurred?
    • Why did it occur?
    • Where did the error occur?

The last three questions are especially tricky and drive me nuts on a regular basis because a unit test should never withhold these things from you. After all, we use these tests to catch bugs to begin with. Why make it harder for the developer fix them?

In my example an error occurred, but debugging Zend_Test also kicks in when things supposedly go according to plan. Follow me to the real life example.

Real life example

I have an Api_IndexController where requests to my API are validated in its preDispatch().

Whenever a request is not validated, I will issue "HTTP/1.1 401 Unauthorized". For the sake of this example, this is exactly what happens.

class ApiController extends Zend_Controller_Action
{
    protected $authorized = false;
    
    public function preDispatch()
    {
        // authorize the request
        // ...
    }
    public function profileAction()
    {
        if ($this->authorized === false) {
            $this->getResponse()->setRawHeader('HTTP/1.1 401 Unauthorized');
        }
        // ...
    }
}

Here's the relevant test case:

class Api_IndexControllerTest ...

    public function testUnAuthorizedHeader()
    {
        $this->dispatch('/api/profile'); // unauthorized
        $this->assertResponseCode(401);
    }
}

The result:

1) Api_IndexControllerTest::testUnAuthorizedHeader
Failed asserting response code "401"

/project/library/Zend/Test/PHPUnit/Constraint/ResponseHeader.php:230
/project/library/Zend/Test/PHPUnit/ControllerTestCase.php:773
/project/tests/controllers/api/IndexControllerTest.php:58

Not very useful, eh?

Debugging

Before you step through your application with print, echo and an occasional var_dump, here's a much better way of see what went wrong.

I'm using a custom Listener for PHPUnit, which works sort of like an observer. This allows me to see where I made a mistake without hacking around in Zend_Test.

Here is how it works

Discover my PEAR channel:

sudo pear channel-discover till.pearfarm.org

Install:

till@till-laptop:~/ sudo pear install till.pearfarm.org/Lagged_Test_PHPUnit_ControllerTestCase_Listener-alpha
downloading Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0.tgz ...
Starting to download Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0.tgz (2,493 bytes)
....done: 2,493 bytes
install ok: channel://till.pearfarm.org/Lagged_Test_PHPUnit_ControllerTestCase_Listener-0.1.0

If you happen to not like PEAR (What's wrong with you? ;-)), the code is also on github.

Usage

This is my phpunit.xml:

<?xml version="1.0" encoding="utf-8"?>
<phpunit bootstrap="./TestInit.php" colors="true" syntaxCheck="true">
    <testsuites>
    ...
    </testsuites>
    <listeners>
        <listener class="Lagged_Test_PHPUnit_ControllerTestCase_Listener" file="Lagged/Test/PHPUnit/ControllerTestCase/Listener.php" />
    </listeners>
</phpunit>

Output

Whenever I run my test suite and a test fails, it will add something like this to the output of PHPUnit:

PHPUnit 3.4.15 by Sebastian Bergmann.

..Test 'testUnAuthorizedHeader' failed.
RESPONSE

Status Code: 200

Headers:

     Cache-Control - public, max-age=120 (replace: 1)
     Content-Type - application/json (replace: 1)
     X-Ohai - WADDAP (replace: false)

Body:

{"status":"error","msg":"Not authorized"}

F..

Time: 5 seconds, Memory: 20.50Mb

There was 1 failure:

1) Api_IndexControllerTest::testUnAuthorizedHeader
Failed asserting response code "401"

/project/library/Zend/Test/PHPUnit/Constraint/ResponseHeader.php:230
/project/library/Zend/Test/PHPUnit/ControllerTestCase.php:773
/project/tests/controllers/api/IndexControllerTest.php:58

FAILURES!
Tests: 5, Assertions: 12, Failures: 1.

Analyze

Analyzing the output, I realize that my status code was never set. Even though I used a setRawHeader() call to set it. Turns out setRawHeader() is not parsed so the status code in Zend_Controller_Response_Abstract is not updated.

IMHO this is also a bug and a limitation of the framework or Zend_Test.

The quickfix is to do the following in my action:

$this->getResponse()->setHttpResponseCode(401);

Fin

That's all. Quick, but not so dirty. If you noticed, I got away without hacking Zend_Test or PHPUnit.

The listener pattern provides us with very powerful methods to hook into our test suite. If you see the source code it also contains methods for skipped tests, errors, test suite start and end.

PHP, APC and sessions

Playing with redis/Rediska and sessions, I wanted to get more numbers to compare this solution to a traditional MySQL-based approach which also made me revisit the idea of a CouchDB-based session handler for Zend_Session.

Implementing this handler, I ran into a weird issue:

Fatal error: Undefined class constant 'ALLOW_ALL' in /usr/home/till/foo/trunk/library/Zend/Uri/Http.php on line 447
Call Stack
#   Time    Memory  Function    Location
1   0.7357  3914816 Foo_Session_SaveHandler_Couchdb->write( )   ../Couchdb.php:0
2   0.7358  3916600 Foo_Couchdb->query( )   ../Couchdb.php:94
3   0.7361  3969464 Zend_Http_Client->__construct( )    ../Couchdb.php:368
4   0.7361  3969544 Zend_Http_Client->setUri( ) ../Client.php:250
5   0.7362  3976568 Zend_Uri::factory( )    ../Client.php:267
6   0.7365  4003352 Zend_Uri_Http->__construct( )   ../Uri.php:130
7   0.7367  4006216 Zend_Uri_Http->valid( ) ../Http.php:154
8   0.7368  4006216 Zend_Uri_Http->validateHost( )  ../Http.php:281

The funny thing is that that APC was added (for apc_store() and apc_fetch()) at the same time to the game (to cache the configuration) and when I disabled it, the error disappeared.

Talking to to one of the leads of APCGopal (Btw, cheers for helping!) — on IRC (#pecl@efnet) I thought at first that the issue was autoload related and we thought the order in which the extensions are loaded might make a difference. From Rasmus' comment, I later discovered bug #16745 with a proposed workaround to use session_write_close().

On a sidenote: I'm still not sure why the error is expected behavior for some people but yet it works with some PHP and APC versions and breaks with others. From what I gathered it broke for me with 5.2.6, 5.2.11 and 5.3.2. Tried all with the latest version of APC (3.1.3p1).

Here's how I fixed it for myself

I have a Lagged_Application class to bootstrap my application. Lagged_Application is kind of like Zend_Application sans a lot of safety nets and magic. Since it does a lot less, it's also quiet a bit faster. To get an idea, check out my Google Code repository (for an alas rather outdated version of it).

I added the following function to it:

<?php
// (...)
public function shutdown()
{
    session_write_close();
}

My index.php looks like the following:

<?php
include 'library/Lagged/Application.php';
$app = new Lagged_Application;
$app->setEnvironment('production');
$app->bootstrap();

register_shutdown_function(array($app, 'shutdown'));

Somewhat related — shutdown() could be a good start to tear down other objects as well, when needed.

More?

Now that this issue is fixed, I think also the infamous Fatal error: Exception thrown without a stack frame in Unknown on line 0 originates from the same issue. That is, when sessions and APC are around — but I should dig a little deeper to verify this.

All in all, it's a pretty weird issue and IM(very)HO, objects shouldn't be torn down or some sort of before hook should be executed to avoid this. But that's especially easy to say if you don't do C. :-)

Fin

That's all. I sure hope this saves someone else some time.

Operating CouchDB

These are some random operational things I learned about CouchDB. While I realize that my primary use-case (a CouchDB install with currently 230+ million documents) may be oversized for many, these are still things important things to know and to consider. And I would have loved to know of some of these before we grew that large.

I hope these findings are useful for others.

Compaction

CouchDB doesn't take great care of diskspace — the assumption is that disk is cheap. To get on top of it, you should run database and view compaction as often as you can. And the good news is, these operations help you to reclaim a lot of space (e.g. I've seen an uncompacted view of 200 GB trim down to ~30 GB).

Cleanup

In case you changed the actual view code, make sure to run the clean-up command (curl -X POST http://server/db/_view_cleanup) to regain disk space.

Performance impact

Database and view compaction (especially on larger sets) will slow down reads and writes considerably. Schedule downtime, or do it in off-peak hours if you think the operation is fast enough.

This is especially hideous when you run in a cloud environment where disk i/o generally sucks (OHAI, EBS!!!).

To stop either of those background-tasks, just restart CouchDB.

(Just fyi, the third option is of course to throw resources (hardware) at it.)

Resuming view compaction?

HA, HA! [Note, sarcasm!] — view compaction is not resumable, like database compaction.

View files

I suggest you split views into several design documents — this will have the following benefit.

For each design document, CouchDB will create a .view file (by default these are in var/lib/couchdb/database-name/.database-name_design/). It's just faster to run compact and cleanup operation on multiple (hopefully smaller files) versus one giant file.

In the end, you don't run the operation against the file directly, but against CouchDB — but CouchDB will deal with a smaller file which makes the operation faster and generally shorter — I call this poor man's view partitioning.

Warming the cache

Cache warming is when a cache is populated with items in order to avoid the cache and server being hit with too much traffic when a server starts up and here is what you can do with CouchDB in this regard.

The basics are obvious — updates to a CouchDB view are performed when the view is requested. This has certain advantages and works well in most situations. Something I noticed is that especially on smaller VPS servers where resources tend to be oversold and and are rare in general, generating view updates can slow your application down to a full stop.

As a matter of fact and CouchDB does often not respond during that operation when the disk was saturated (take into account that even a 2 GB database will get hard to work with if you only have 1 GB of RAM for CouchDB and the OS, and whatever else is running on the same server).

The options are:

  1. To get more traffic so views are constantly update and the updates performed are kept at a minimum.
  2. Make your application query views with ?stale=ok and instead update the views on a set interval, for example via a curl request in a cronjob.

Cache-warming for dummies, the CouchDB way.

View data

For various reasons such as space management and performance, it doesn't hurt to put all views on its own dedicated partition.

In order to do this, add the following to your local.ini (in [couchdb]): view_index_dir = /new_device

And assuming your database is called "helloworld" and the view dir is /new_device, your .view-files will end up in /new_device/.helloworld_design.

Overshard

I've blogged on CouchDB and CouchDB-Lounge before. No matter if you use the Lounge or build sharding into your application — consider it. From what I learned it's better to shard earlier (= overshard), than when it's too late. The faster your CouchDB grows, the more painful it will be to deal with all the data stuck in.

My suggestion is that when you rapidly approach 50,000,000 documents and see yourself exceeding (and maybe doubling) this number soon, take a deep breath and think about a sharding strategy.

Oversharding has the advantage that for example you run 10 CouchDB instances on the same server and move each of them (or a couple) to their own dedicated hardware once they exceed the resources of the single hardware.

If sharding is not your cup of tea, just outsource to Cloudant — they do a great job.

CouchDB-Lounge

CouchDB-Lounge is Meebo's python-based sharding framework for CouchDB. The lounge is essentially an nginx-webserver and a twistd service which proxies your requests to different shards using a shards.conf. The number of shards and also the level of redundancy are all defined in it.

CouchDB-Lounge is a great albeit young project. The current shortcomings IMHO include general stability of the twistd service and absence of features such as _bulk_docs which makes migrating a larger set into CouchDB-Lounge a tedious operation. Never the less, this is something to keep an eye on.

Related to CouchDB-Lounge, there's also lode — a JavaScript- and node.js-based rewrite of the Python lounge.

Erlang-Lounge

What I call Erlang-Lounge is Cloudant's internal erlang-based sharding framework for CouchDB. It's in production at Cloudant and to be released soon. From what I know Cloudant will probably offer a free opensource version and support once they released it.

Disk, CPU or memory — which is it?

This one is hard to say. But despite how awesome Erlang is, even CouchDB depends on the system's available resources. ;-)

Disk

For starters, disk i/o is almost always the bottleneck. To verify if this the bottleneck in your particular case, please run and analyze [iostat][] during certain operations which appear to be slow in your context. For everyone on AWS, consider a RAID-0 setup, for everyone else, buy faster disks.

CPU

The more CPU in a server, the more beam processes. CouchDB (or Erlang) seem to take great advantage of this resource. I haven't really figured out a connection between CPU and general performance though because in my case memory and disk were always the bottleneck.

Memory

... seems to be another underestimated bottleneck. For example, I noticed that replication can slow down to a state where it seems faster to copy-paste documents from one instance to another when CouchDB is unable to cache an entire b-tree in RAM.

We've been testing some things on a nifty high-memory 4XL AWS instance and during a compact operation, almost 90% of my ram (70 GB) was used by the OS to cache. And don't make my mistake and rely on (h)top to verify this, cat /proc/meminfo instead.

Caching

Caching is trivial with CouchDB.

e-tags

Each document and view responds with an Etag header — here is an example:

curl -I http://foo:[email protected]:5984/foobar/till-laptop_1273064525
HTTP/1.1 200 OK
Server: CouchDB/0.11.0a-1.0.7 (Erlang OTP/R13B)
Etag: "1-92b4825ffe4c61630d3bd9a456c7a9e0"
Date: Wed, 05 May 2010 13:20:12 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1771
Cache-Control: must-revalidate

The Etag will only changes, when the data in the document change. Hence it's trivial to avoid hitting the database if you don't have to. The request above is a very lightweight HEAD request which only gathers the data and does not pull the entire document.

_changes

_changes represents a live-update feed of your CouchDB database. It's located at http://server/dbname/_changes.

Whenever a data changing operation is completed, _changes will reflect that, which makes it easy for a developer to stay on top to for example invalidate an application cache only when needed (and not like it's done usually when the cache expired).

Logging

Logrotate

First off, a lot of people run CouchDB from source which means that in 99% of all installs, the logrotation is not activated.

To fix this (on Ubuntu/Debian), do the following:

ln -s /usr/local/etc/logrotate.d/couch /etc/logrotate.d/couchdb

Make sure to familiarize yourself a little with logrotatet because depending on space and business of your installation, you should adjust the config a little to not run out of diskspace. If CouchDB is unable to log, it will crash.

Loglevel

In most cases it's more than alright to just run with a log level of error.

Add the following to your local.ini (in [log]): level = error

Log directory

Still running out of diskspace? Add the following to your local.ini (in [log]):

file = /path/to/more/diskspace/couch.log

... if you adjusted the above, you will need to correct the config for logrotate.d as well.

No logging?

Last but not least — if no logs are needed, just turn them off completely.

Fin

That's all kids.