Lusis: zookeeper

Showing posts with label zookeeper. Show all posts

Tuesday, May 17, 2011

On Noah - Part 2

This is the second part in a series on Noah. Part 1 is available here

In part one of this series, I went over a little background about ZooKeeper and how the basic Zookeeper concepts are implemented in Noah. In this post, I want to go over a little bit about a few things that Noah does differently.

Noah Primitives

As mentioned in the previous post, Noah has 5 essential data types, four of which are what I've interchangeably refered to as either Primitives and Opinionated models. The four primitives are Host, Service, Application and Configuration. The idea was to map some common use cases for Zookeeper and Noah onto a set of objects that users would find familiar.

You might detect a bit of Nagios inspiration in the first two.

Host: Analogous to a traditional host or server. The machine or instance running the operating system. Unique by name.
Service: Typically mapped to something like HTTP or HTTPS. Think of this as the listening port on a Host. Services must be bound to Hosts. Unique by service name and host name.
Application: Apache, your application (rails, php, java, whatever). There's a subtle difference here from Service. Unique by name.
Configuration: A distinct configuration element. Has a one-to-many relationship with Applications. Supports limited mime typing.

Hosts and Services have a unique attribute known as status. This is a required attribute and is one of up,down or pending. These primitives would work very well integrated into the OS init process. Since Noah is curl-friendly, you could add something globally to init scripts that updated Noah when your host is starting up or when some critical init script starts. If you were to imagine Noah primitives as part of the OSI model, these are analagous to Layers 2 and 3.

Applications and Configurations are intended to feel more like Layer 7 (again, using our OSI model analogy). The differentiation is that your application might be a Sinatra or Java application that has a set of Configurations associated with it. Interestingly enough, you might choose to have something like Tomcat act as both a Service AND an Application. The aspect of Tomcat as a Service is different than the Java applications running in the container or even Tomcat's own configurations (such as logging).

One thing I'm trying to pull off with Configurations is limited mime-type support. When creating a Configuration in Noah, you can assign a format attribute. Currently 3 formats or types are understood:

string
json
yaml

The idea is that, if you provide a type, we will serve that content back to you in that format when you request it (assuming you request it that way via HTTP headers). This should allow you to skip parsing the JSON representation of the whole object and instead use it directly. Right now this list is hardcoded. I have a task to convert this.

Hosts and Services make a great "canned" structure for building a monitoring system on top of Noah. Applications and Configurations are a lightweight configuration management system. Obviously there are more uses than that but it's a good way to look at it.

Ephemerals

Ephemerals, as mentioned previously, are closer to what Zookeeper provides. The way I like to describe Ephemerals to people is a '512 byte key/value store with triggers' (via Watch callbacks). If none of the Primitives fit your use case, the Ephemerals make a good place to start. Simply send some data in the body of your post to the url and the data is stored there. No attempt is made to understand or interpret the data. The hierarchy of objects in the Ephemeral namespace is completely arbitrary. Data living at /ephemerals/foo has no relationship with data living at /ephemerals/foo/bar.

Ephemerals are also not browseable except via a Linking and Tagging.

Links and Tags

Links and Tags are, as far as I can tell, unique to Noah compared to Zookeeper. Because we namespace against Primitives and Ephemerals, there existed the need to visualize objects under a custom hierarchy. Currently Links and Tags are the only way to visualize Ephemerals in a JSON format.

Tags are pretty standard across the internet by now. You might choose to tag a bunch of items as production or perhaps group a set of Hosts and Services as out-of-service. Tagging an item is a simple process in the API. Simply PUT the name of the tag(s) to the url of a distinct named item appended by tag. For instance, the following JSON posted to /applications/my_kick_ass_app/tag with tag the Application my_kick_ass_app with the tags sinatra, production and foobar:

{"tags":["sinatra", "production", "foobar"]}

Links work similar to Tags (including the act of linking) except that the top level namespace is now replaced with the name of the Link. The top level namespace in Noah for the purposes of Watches is //noah. By linking a group of objects together, you will be able to (not yet implemented), perform operations such as Watches in bulk. For instance, if you wanted to be informed of all changes to your objects in Noah, you would create a Watch against //noah/*. This works fine for most people but imagine you wanted a more multi-tenant friendly system. By using links, you can group ONLY the objects you care about and create the watch against that link. So //noah/* becomes //my_organization/* and only those changes to items in that namespace will fire for that Watch.

The idea is also that other operations outside of setting Watches can be applied to the underlying object in the link as well. The name Link was inspired by the idea of symlinking.

Watches and Callbacks

In the first post, I mentioned that by nature of Noah being "disconnected", Watches were persistent as opposed to one-shot. Additionally, because of the pluggable nature of Noah Watches and because Noah has no opinion regarding the destination of a fired Watch, it becomes very easy to use Noah as a broadcast mechanism. You don't need to have watches for each interested party. Instead, you can create a callback plugin that could dump the messages on an ActiveMQ Fanout queue or AMQP broadcast exchange. You could even use multicast to notify multiple interested parties at once.

Again, the act of creating a watch and the destination for notifications is entirely disconnected from the final client that might use the information in that watch event.

Additionally, because of how changes are broadcast internally to Noah, you don't even have to use the "official" Watch method. All actions in Noah are published post-commit to a pubsub queue in Redis. Any language that supports Redis pubsub can attach directly to the queue and PSUBSCRIBE to the entire namespace or a subset. You can write your own engine for listening, filtering and notifying clients.

This is exactly how the Watcher daemon works. It attaches to the Redis pubsub queue, makes a few API calls for the current registered set of watches and then uses the watches to filter messages. When a new watch is created, that message is like any other change in Noah. The watcher daemon sees that and immediately adds it to its internal filter. This means that you can create a new watch, immediately change the watched object and the callback will be made.

Wrap up - Part Two

So to wrap up:

Noah has 5 basic "objects" in the system. Four of those are opinionated and come with specific contracts. The other is a "dumb" key/value store of sorts.
Noah provides Links and Tags as a way to perform logical grouping of these objects. Links replace the top-level hierarchy.
Watches are persistent. The act of creating a watch and notifying on watched objects is disconnected from the final recipient of the message. System A can register a watch on behalf of System B.
Watches are nothing more than a set of filters applied to a Redis pubsub queue listener. Any language that supports Redis and its pubsub queue can be a processor for watches.
You don't even have to register any Watches in Noah if you choose to attach and filter yourself.

Part three in this series will discuss the technology stack under Noah and the reasoning behind it. A bit of that was touched on in this post. Part four is the discussion about long-term goals and roadmaps.

Monday, May 16, 2011

On Noah - Part 1

This is the first part in a series of posts going over Noah

As you may have heard (from my own mouth no less), I've got a smallish side project I've been working on called Noah.

It's a project I've been wanting to work on for a long time now and earlier this year I got off my ass and started hacking. The response has been nothing short of overwhelming. I've heard from so many people how they're excited for it and nothing could drive me harder to work on it than that feedback. To everyone who doesn't run away when I talk your ear off about it, thank you so much.

Since I never really wrote an "official" post about it, I thought this would be a good opportunity to talk about what it is, what my ideas are and where I'd like to see it go in the future.

So why Noah?

fair warning. much of the following may be duplicates of information in the Noah wiki

The inspiration for Noah came from a few places but the biggest inspiration is Apache Zookeeper. Zookeeper is one of those things that by virtue of its design is a BUNCH of different things. It's all about perspective. I'm going to (yet again) paste the description of Zookeeper straight from the project site:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Now that might be a bit confusing at first. Which is it? Is it a configuration management system? A naming system? It's all of them and, again, it's all about perspective.

Zookeeper, however, has a few problems for my standard use case.

Limited client library support
Requires persistent connections to the server for full benefit

By the first, I mean that the only official language bindings are C and Java. There's contributed Python support and Twitter maintains a Ruby library. However all of these bindings are "native" and must be compiled. There is also a command-line client that you can use for interacting as well - one in Java and two C flavors.

The second is more of a showstopper. Zookeeper uses the client connection to the server as in-band signaling. This is how watches (discussed in a moment) are communicated to clients. Persistent connections are simply not always an option. I can't deploy something to Heroku or AppEngine that requires that persistent connection. Even if I could, it would be cost-prohibitive and honestly wouldn't make sense.

Looking at the list of features I loved about ZK, I thought "How would I make that work in the disconnected world?". By that I mean what would it take to implement any or all of the Zookeeper functionality as a service that other applications could use?

From that thought process, I came up with Noah. The name is only a play on the concept of a zookeeper and holds no other real significance other than irritation at least two people named Noah when I talk about the project.

So working through the feature list, I came up with a few things I REALLY wanted. I wanted Znodes, Watches and I wanted to do it all over HTTP so that I could have the broadest set of client support. JSON is really the defacto standard for web "messaging" at this point so that's what I went with. Basically the goal was "If your language can make HTTP requests and parse JSON, you can write a Noah client"

Znodes and Noah primitives

Zookeeper has a shared hierarchical namespace similar to a UNIX filesystem. Points in the hierarchy are called znodes. Essentially these are arbitrary paths where you can store bits of data - up to 1MB in size. These znodes are unique absolute paths. For instance:

    /
/systems
/foo
/bar
/networks
/kansas
/router-1
/router-2

Each fully qualified path is a unique znode. Znodes can be ephemeral or persistent. Zookeeper also has some primitives that can be applied to znodes such as 'sequence`.

When I originally started working on Noah, so that I could work with a model, I created some base primitives that would help me demonstrate an example of some of the use cases:

Host
Service
Application
Configuration

These primitives were actual models in the Noah code base with a strict contract on them. As an example, Hosts must have a status and can have any number of services associated with them. Services MUST be tied explicity to a host. Applications can have Configurations (or not) and Configurations can belong to any number of Applications or not. Additionally, I had another "data type" that I was simply calling Ephemerals. This is similar to the Zookeeper znode model. Originally I intended for Ephemerals to be just that - ephemeral. But I've backed off that plan. In Noah, Ephemerals can be either persistent or truely ephemeral (not yet implemented).

So now I had a data model to work with. A place to store information and flexibility to allow people to use the predefined primitives or the ephemerals for storing arbitrary bits of information.

Living the disconnected life

As I said, the model for my implementation was "disconnected". When thinking about how to implement Watches in a disconnected model, the only thing that made sense to me was a callback system. Clients would register an interest on an object in the system and when that object changed, they would get notified by the method of their choosing.

One thing about Watches in Zookeeper that annoys me is that they're one-shot deals. If you register a watch on a znode, once that watch is triggered, you have to REREGISTER the watch. First off this creates, as documented by the ZK project, a window of opportunity where you could miss another change to that watch. Let's assume you aren't using a language where interacting with Zookeeper is a synchronous process:

Connect to ZK
Register watch on znode
Wait
Change happens
Watch fires
Process watch event
Reregister watch on znode

In between those last two steps, you risk missing activity on the znode. In the Noah world, watches are persistent. This makes sense for two reasons. The first is that the latency between a watch callback being fired and proccessed could be much higher than the persistent connection in ZK. The window of missed messages is simply much greater. We could easily be talking 100's of milliseconds of latency just to get the message and more so to reregister the watch.

Secondly, the registration of Watches in Noah is, by nature of Noah's design and as a byproduct, disconnected from the consumer of those watches. This offers much greater flexibility in what watches can do. Let's look at a few examples.

First off, it's important to understand how Noah handles callbacks. The message format of a callback in Noah is simply a JSON representation of the changed state of an object and some metadata about the action taken (i.e. delete, create, update). Watches can be registered on distinct objects, a given path (and thus all the children under that path) and further refined down to a given action. Out of the box, Noah ships with one callback handler - http. This means that when you register a watch on a path or object, you provide an http endpoint where Noah can post the aforementioned JSON message. What you do with it from there is up to you.

By virtue of the above, the callback system is also designed to be 'pluggable' for lack of a better word. While the out of the box experience is an http post, you could easily write a callback handler that posted the message to an AMQP exchange or wrote the information to disk as a flat file. The only requirement is that you represent the callback location as a single string. The string will be parsed as a url and broken down into tokens that determine which plugin to call.

So this system allows for you to distribute watches to multiple systems with a single callback. Interestingly enough, this same watch callback system forms the basis of how Noah servers will share changes with each other in the future.

Wrap up - Part 1

So wrapping up what I've discussed, here are the key take aways:

Noah is a 'port' of specific Zookeeper functionality to a disconnected and asynchronous world
Noah uses HTTP and JSON as the interface to the server
Noah has both traditional ZK-style Ephemerals as well as opinionated Primitives
Noah uses a pluggable callback system to approximate the Watch functionality in Zookeeper
Clients can be written in any language that can speak HTTP and understand JSON (yes, even a shell script)

Part 2 and beyond

In part two of this series we'll discuss some of the additions to Noah that aren't a part of Zookeeper such as Tags and Links. Part 3 will cover the underlying technology which I am intentionally not discussing at this point. Part 4 will be a roadmap of my future plans for Noah.

Tuesday, March 8, 2011

Ad-Hoc Configuration, Coordination and the value of change

For those who don't know, I'm currently in Boston for DevOps Days. It's been amazing so far and I've met some wonderful people. One thing that was REALLY awesome was the open space program that Patrick set up. You won't believe it works until you've tried it. It's really powerful.

In one of our open spaces, the topic of ZooKeeper came up. At this point I made a few comments, and at the additional prodding of everyone went into a discussion about ZooKeeper and Noah. I have a tendency to monopolize discussions around topics I'm REALLY passionate about so many thanks for everyone who insisted I go on ;)

Slaughter the deviants!
The most interesting part of the discussion about ZooKeeper (or at least the part I found most revealing) was that people tended to have trouble really seeing the value in it. One of the things I've really wanted to do with Noah is provide (via the wiki) some really good use cases about where it makes sense.

I was really excited to get a chance to talk with Alex Honor (one of the co-founders of DTO along with Damon Edwards) about his ideas after his really interesting blog post around ad-hoc configuration. If you haven't read it, I suggest you do so.

Something that often gets brought up and, oddly, overlooked at the same time is the where ad-hoc change fits into a properly managed environment (using a tool like puppet or chef).

At this point, many of you have gone crazy over the thought of polluting your beautifully organized environment with something so dirty as ad-hoc changes. I mean, here we've spent all this effort on describing our infrastructure as code and you want to come in and make a random, "undocumented" change? Perish the thought!

However, as with any process or philosophy, strict adherence with out understanding WHEN to deviate will only lead to frustration. Yes, there is a time to deviate and knowing when is the next level of maturity in configuration management.

So when do I deviate
Sadly, knowing when it's okay to deviate is as much a learning experience as it was getting everything properly configured in the first place. To make it even worse, that knowledge is most often specific to the environment in which you operate. The whole point of the phrase ad-hoc is that it's..well...ad-hoc. It's 1 part improvisation/.5 parts stumbling in the dark and the rest is backfilled with a corpus of experience. I don't say this to sound elitist.

So, really, when do I deviate. When/where/why and how do I deviate from this beautifully described environment? Let's go over some use cases and point out that you're probably ALREADY doing it to some degree.

Production troubleshooting
The most obvious example of acceptable deviation is troubleshooting. We pushed code, our metrics are all screwed up and we need to know what the hell just happened. Let's crank up our logging.

At this point, changing your log level, you've deviated from what your system of record (your CM tool) says you should be. Our manifests, our cookbooks, our templates all have us using a loglevel of ERROR but we just bumped up one server to DEBUG. so we could troubleshoot. That system is now a snowflake. Unless you change that log level back to ERROR, you know have one system that will, until you do a puppetrun of chef-client run is different than all the other servers of the class/role.

Would you codify that in the manifest? No. This is an exception. A (should be) short-lived exception to the rules you've defined.

Dynamic environments
Another area where you might deviate is in highly elastic environments. Let's say you've reached the holy grail of elasticity. You're growing and shrinking capacity based on some external trigger. You can't codify this. I might run 20 instances of my app server now but drop back down to 5 instances when the "event" has passed. In a highly elastic environment, are you running your convergence tool after every spin up? Not likely. In an "event" you don't want to have to take down your load balancer (and thus affect service to the existing intstances) just to add capacity. A bit of a contrived example but you get the idea.

So what's the answer?
I am by far not the smartest cookie in the tool shed but I'm opinionated so that has to count for something. These "exception" events are where I see additional tools like Zookeeper (or my pet project Noah) stepping in to handle things.

Distributed coordination, dynamically reconfigurable code, elasticity and environment-aware applications.
These are all terms I've used to describe this concept to people. Damon Edwards provided me with the last one and I really like it.

Enough jibber-jabber, hook a brother up!
So before I give you the ability to shoot yourself in the foot, you should be aware of a few things:

It's not a system of record

Your DDCS (dynamic distributed coordination service as I'll call it because I can't ever use enough buzzwords) is NOT your system of record. It can be but it shouldn't be. Existing tools provide that service very well and they do it in an idempotent manner.

Know your configuration

This is VERY important. As I said before, much of this is environment specific. The category of information you're changing in this way is more "transient" or "point-in-time". Any given atom of configuration information has a specific value associated with it. Different levels of volatility. Your JDBC connection string is probably NOT going to change that often. However, the number of application servers might be at different amounts of capacity based on some dynamic external factor.

Your environment is dynamic and so should be your response

This is where I probably get some pushback. Just as one of the goals of "devops" was to deal with, what Jesse Robbins described to day as misalignment of incentive, there's an internal struggle where some values are simply fluctuating in near real time. This is what we're trying to address.

It is not plug and play

One thing that Chef and Puppet do very well is that you can, with next to no change to your systems, predefine how something should look or behave and have those tools "make it so".

With these realtime/dynamic configuration atoms your application needs to be aware of them and react to them intelligently.

Okay seriously. Get to the point
So let's take walk through a scenario where we might implement this ad-hoc philosophy in a way that gives us the power we're seeking.

The base configuration

application server (fooapp) uses memcached, two internal services called "lookup" and "evaluate" and a data store of somekind.
"lookup" and "evaluate" are internally developed applications that provide private REST endpoints for providing a dictionary service (lookup) and a business rule parser of some kind (evaluate).
Every component's base configuration (including the data source that "lookup" and "evaluation" use) is managed, configured and controlled by puppet/chef.

In a standard world, we store the ip/port mappings for "lookup" and "evaluate" in our CM tool and tags those. When we do a puppet/chef client run, the values for those servers are populated based on the ip/port information our EXISTING "lookup"/"evaluate" servers.

This works. It's being done right now.

So where's the misalignment?
What do you do when you want to spin up another "lookup"/"evaluate" server? Well you would probably use a bootstrap of some kind and apply, via the CM tool, the changes to those values. However this now means that for this to take effect across your "fooapp" servers you need to do a manual run of your CM client. Based on the feedback I've seen across various lists, this is where the point of contention exists.

What about any untested CM changes (a new recipe for instance). I don't want to apply that but if I run my CM tool, I've now not only pulled those unintentional changes but also forced a bounce of all of my fooapp servers. So as a side product of scaling capacity to meet demand, I've now reduced my capacity at another point just to make my application aware of the new settings.

Enter Noah
This is where the making your application aware of its environment and allowing it to dynamically reconfigure itself pays off.

Looking at our base example now, let's do a bit of architectural work around this new model.

My application no longer hardcodes a base list of servers prodviding "lookup" and "evaluate" services.
My application understands the value of a given configuration atom
Instead of the hardcoded list, we convert those configuration atoms akin to something like a singleton pattern that points to a bootstrap endpoint.
FooApp provides some sot of "endpoint" where it can be notified of changes to the number/ip addresses or urls available a a given of our services. This can also be proxied via another endpoint.
The "bootstrap" location is managed by our CM tool based on some more concrete configuration - the location of the bootstrap server.

Inside our application, we're now:

Pulling a list of "lookup"/"evaluate" servers from the bootstrap url (i.e. http://noahserver/s/evaluate)
Registering a "watch" on the above "path" and providing an in-application endpoint to be notified when they change.
validating at startup if the results of the bootstrap call provide valid information (i.e. doing a quick connection test to each of the servers provided by the bootstrap lookup or a subset thereof)

If we dynamically add a new transient "lookup" server, Noah fires a notification to the provided endpoint with the details of the change. The application will receive a message saying "I have a new 'lookup' server available". It will run through some sanity checks to make sure that the new "lookup" server really does exist and works. It then appends the new server to the list of existing (permanent servers) and start taking advantage of the increase in capacity.

That's it. How you implement the "refresh" and "validation" mechanisms is entirely language specific. This also doesn't, despite my statements previously, have to apply to transient resources. The new "lookup" server could be a permanent addition to my infra. Of course this would have been captured as part of the bootstrapping process if that were the case.

Nutshell
And that's it in a nutshell. All of this is availalbe in Noah and Zookeeer right now. Noah is currently restricted to http POST endpoints but that will be expanded. Zookeeper treats watches as ephemeral. Once the event has fired, you must register that same watch. With Noah, watches are permanent.

Takeaway
I hope the above has made sense. This was just a basic introduction to some of the concepts and design goals. There are plenty of OTHER use cases for ZooKeeper alone. So the key take aways are:

Know the value of your configuration data
Know when and where to use that data
Don't supplant your existing CM tool but instead enhance it.

Links
Noah
ZooKeeper
Hadoop Book (which has some AMAZING detail around ZooKeeper, the technology and use cases