Wednesday, September 29, 2010

Distributions and Dynamic Languages - A Manifesto

Background

There's been a lot of talk recently on Twitter and in various posts across the Intertubes about how the various distributions handle dynamic languages and the package system those languages use. This has been a sore spot for me for a LONG time. Recently I had a chance to "stub out" my feelings in a comment on HN. I've been meaning to write this post for a few weeks but just haven't had the time. I'm making the time now.

Distro vendors find themselves in an interesting spot. In general, the difference between Linux distributions has boiled down to a few categories:

  • Support
  • Management tools
  • Package format
  • Default desktop

For the home/desktop user, the last two (and more importantly the last one) are the biggest deciding factors. For the "enterprise" user, the first is typically key. But not all enterprises are enterprises. Would anyone argue that Facebook or Google or Twitter are not enterprise users? Of course not. However those companies don't tend to need the same level of support and have the same hang-ups as Coca-Cola or Home Depot. The latter two companies are the traditional enterprise that does things like troubleshoot servers when they fail. The former are the forward thinking companies that say "Fuck it. Pull the server and put another one in. We don't have time for this 'bench' shit."

In the same vein, the first group of companies are the kind that use Linux as a platform where the second group uses RedHat or Suse as an OS to host JBoss or Oracle or DB2. Those vendors say "We run on distros X and Y and we only support those". You don't have a choice in the second group. The first group may have standardized on a distro but the distro itself is also irrelevant. Those companies use Chef and Puppet and similar tools to totally abstract that out. The distro becomes a commodity. They just want Linux.

This is the new type of company and this is the type of company that distro vendors have to worry about.

So having said that, how does those tie into the dynamic language debacle of late? Increasingly, applications in the PaaS/SaaS space are being written in dynamic languages. The product is just different than Oracle or DB2. So these companies need to consider which distro will make using those dynamic languages as easy as possible. Frankly, they've all pretty much fucked it up. The main reason? Traditional software products.

The biggest selling point of an enterprise distro was support. That, or the fact that you were required to run RedHat or Suse for your RAC cluster. One of the main reasons that enterprise distros were able to be supported platforms for Oracle or DB2 is that they "stabilized" things. In this case that meant long term support (LTS) models and a consistent base operating system. If you ported your product to run on RHEL4, you could guarantee that RedHat would never break compatibility for the life of that product support cycle (I think it's 7 years right now?). You could also be assured that version X of a package would be available for the platform should you need it.

The Problem

That worked fine for binary COTS products. Not so fine for the world of dynamic languages where new versions of a Gem or Python package come out daily. And ESPECIALLY not when the language package system allows for multiple versions of the same package to be installed alongside each other. But is this really a big deal? The distros can just upgrade python to 2.7 right? Nope and the reason why?

Management tools

I don't fault the distro vendors for using python (as an example) as the higher level management language for the OS. In fact, having now gotten into Python, I think it's a wonderful idea. It is, language wars aside, a very approachable and consistent language. It allows them to quickly iterate those tools and especially in the case of Python, the core language changes very little. It's mature.

So now distro vendors have gone and written core parts of the operating system to use Python. Combine that with the package manager restrictions and LTS and you have a system where, if you upgrade Python, you've broken the system beyond repair. This is why RHEL5 is still on Python 2.4.

This is the where we find ourselves today. Distro vendors have to continually package all the python modules they want to supply in native package format to the version of the runtime they use. Eventually the module/gem maintainer is going to stop supporting that module on such old runtimes. Now they essentially have to maintain backports for the life of the LTS terms. This is madness. Why would you put yourself in this situation? I didn't know this but FreeBSD evidently solved this problem a while ago by moving all core scripts away from Perl.

The Manifesto

So here's my manifesto. My suggestion if you will as a long time Linux user, enterprise customer and dynamic language programmer.

Stop it. Get out of the game now. As much as you would like to think your customers care about LTS for Perl/Python/Ruby, they don't. Your LTS is irrelevant six months after you cut a new release of a distro. RHEL6 is shipping with Ruby 1.8.6. Seriously? Not even 1.8.7? I understand they have a long development cycle for new distro versions which is why I'm saying get out. You can't keep up.

But what about our management tools?

I've solved that for you to. system-python, system-ruby, system-perl. Isolate them. Treat them as you would /opt/python or /opt/ruby. Make them untouchable. Minimize your reliance on any module/gem/library you don't directly maintain (i.e. a gtk python module). Understand that you will be wasting resource on backporting this module for 5 or 7 years. No more '/usr/bin/env python'. Shebang that bastard to something like '/usr/lib/system-python/bin/python'

So now that you've isolated that dependency, what about people who don't WANT to compile a new ruby or python vm? How do you provide value to them? The ActiveState model. /usr/lib/python27, /usr/lib/python31, /usr/lib/ruby187.

But wasn't the point of this whole discussion around DLR package management? We don't want to maintain a package per vm version of some library.

Then don't.

This is where the onus is on the language writers. Your package format needs to FULLY support installing from a locally hosted repo of some kind. You may not believe it but not every server has internet access. At our company, NONE of the servers can get to the Internet. The still serve content TO the internet but can't get out. Not by proxy. Not at all.

We're essentially forced to download python packages or jar files and copy them to a maven server or host them from apache to use them internally. Either that, or package them as RPMs. With the python packages, it's especially annoying because, while pip will happily pull from any apache-served directory of tarballs, we can't push from setup.py to it. We don't have ANY metadata associated with it at all.

So Ruby/Python/Perl guys, you need to either provide a PyPi/Gem server package that operates in the same way as your public repos do or make those tools operate EXACTLY the same with a local file path as they do with a URL. Look at createrepo for RPMs for an idea of how it can work if you need to. Additionally, tools like RVM and virtualenv really need to work with distro vendors. RVM does a stellar job at this point. Virtualenv has a way to go.

So now the distro vendors have things isolated. They ship said language repo server and by default point all the local language package tools to that repo path or server. Now if the user chooses to grab module X from PyPi to host locally, they've made that decision. It doesn't break they OS. You don't offer support for it unless you really want to and this whole fucking problem goes away.

EDIT:

I realize I'm not saying anything new here. I also realize that distro vendors realize that the distro itself is a commodity. RedHat figured that out a long time ago. Look at the JBoss purchase and everything since then. Additionally, virtualization removes any reason you might have for picking distro X over distro Y because of hardware support in the distro.

3 comments:

Dave Cameron said...

I was wondering if you would mention RVM. Together with gemsets it seems, from my relatively ruby-naive point of view, to address a fair chunk of these problems for ruby.

Ivy is a tool in java-land that handles collections of jars at development time. Several people I know argue that the ONLY correct way to use ivy is to have a curated source of jars in house. Otherwise, you get in to the maven pattern of downloading the whole internet.

I wonder if this should be a problem at deployment time at all, or if the artifact that comes out of the app build process should already have all the require dependencies included. At some point this does bloat out the artifact too much, and there can be system specific binary dependencies that will need to be different in production.

This post was a thought-provoking read at least.

lusis said...

Dave,

Thanks for the comment. I did mentioned RVM somewhere in my screed but essentially it does address the problems. When I hear someone who ISN'T using RVM and the hurdles they go through I cringe.

I'm also glad you mentioned Ivy. I actually meant Ivy and not Maven. I can't remember the tool offhand but we have a server that does that curation for us.

I think curation is the key. Regardless of my opinions of Apple, I think the distros are now in the business of curation (if they weren't already) and how they curate dynamic languages is a big issue.

They didn't have to do this with Java for the longest time because they COULDN'T. It wasn't allowed because they couldn't ship it in the first place. You have the JPackage repository for RHEL derivatives but it is actually seeing the same issues. I mean they have/had at one point "tomcat4", "tomcat5", "tomcat6" packages in the repo but that still feels kludgy to me.

Again, thanks for the comments. I'm totally ganking curation as the buzzword for this topic from you ;)

Wayne said...

I actually created RVM to address many of the concerns and issues that you listed here. Let the System ruby be the system ruby and don't touch it, etc...

As long as you upload the ruby tarballs to the archives directory RVM will compile and install without reaching out to the internet :)

~Wayne