Cassandra and Priam

Using Buri to assist deployment

Presented by Joe Hohertz / @joehohertz

Slides @ http://jhohertz.github.io/cass-buri

Welcome

Audience

Anyone curious about Priam
Those concerned with operation of Cassandra
Users of Amazon Web Services
People interested in NetflixOSS

Shameless Plug

Who am I?

Joe Hohertz

Been building networks/systems since 1996
More software development focus since 2005
Specialty in open source
Recent focus on clould systems

Summary

Explore what Priam is, what it does
Challenges of deploying Priam
Introduction to Buri
Using Buri to deploy Priam

Let's get started!

Priam

What is Priam?

Co-process / Sidecar for Cassandra
Released by Netflix as open source
Implemented in Java, as a web application
Assists the operation of Cassandra clusters in EC2

What operations does it handle?

Manages many of the cassandra.yml values
Start/Stop of cassandra processes
Discovery of topology info for configuring tokens
Within cassandra, a startup handler provides seed information
Controls bootstrap mode (with our patches)
Backup to S3 storage, restoration of nodes
Dead node replacements

Limitations

Does not yet support vnodes (multiple tokens)
Size growth therefore must be based on doubling clusters
Some awkward configuration when running in VPCs in AWS

Other challenges

Available documentation is not up to date
Project is a bit neglected relative to other NetflixOSS projects
Current Netflix tree has a serious bug
Forks outside of Netflix diverge heavily
Very deeply rooted in 1.x Cassandra. 2.0+ challenges some of its assumptions
New developments seem to indicate a trend towards becoming DSE-specific

Patches needed to run Priam sucessfully w/ C* 2.0+

2.0+ Streaming API changes
Cluster bootstrapping changes
Gossipinfo REST call fixes

2.0+ Streaming API changes

Affects network statistics REST call (querying streams)
Also affects restoring backups. (initiating streams)
Using pull request #346 to effect this.

Cluster bootstrap changes

Priam attempts to set auto bootstrap on every node
In 1.x it was possible to get away with this
2.x is more strict
We have modified Priam to ensure the very first node does not get this set.
Requires patch to Cassandra to expose auto bootstrap flag as a system property, included in 2.0.10+

Gossip info REST call fix

REST call meant to work like nodetool gossipinfo
Current code in Priam corrupts the response by not breaking up the data correctly
Causes duplicate entries for some nodes in the response
As of commit 6eb29e7 Priam started using this REST call to probe other nodes on launch to determine if it is performing a dead node replacement
Will cause eventual failures replacing nodes, requiring manual cleanup of gossipinfo via JMX
We have a still pending pull request, #350 addressing this issue.

Our Priam Fork

We'd rather not have a fork, however...
We wanted to have a fully patched tree that works.
No success with getting pull requests merged so far
Located here: https://github.com/viafoura/Priam

Deployment

Use of autoscale groups

Priam requires being deployed in an auto scale group
Not used for any aspect of "auto" scaling
Seperate ASGs per availability zone
Priam uses two things from the ASG:

Name, which must be composite of your cluster name and the availability zone
Maximum instances, used to determine size of cluster

SimpleDB for shared configuration

Two schemas are kept for common configuration
PriamProperties, which is both configuration of Priam itself, as well as variables it will pass to cassandra.yaml across a cluster
InstanceIdentity, which it uses to track the state of the cluster, active/dead nodes, and token assignments
This must be initialized prior to launching your cluster

Immutable deployment

How we (And Netflix) deploy Priam
What is it?
Machine images are generated via a build process
Live machines are never updated directly
Build a new machine image, deploy, cut over

Aminator

Tool for working with EC2 AMIs
Mounts volume from an existing EBS AMI's underlying snapshot
Runs a provisioner within chroot of mount point
Unmounts and snapshots the volume
Registers new AMI against the new snapshot
Built in provisioners for APT/YUM installations
Bring your own base AMIs

Layered AMI generation

Foundation: Very close to a vanilla install of the OS
Base: Local additions to the foundation AMI, things you want everywhere
Role-specific: Run against the base, for a particular application
Why?
Consistency + Speed in generating final role AMIs

Buri

What is Buri?

Implemented mostly in Ansible
Python-based wrapper to simplify use, provide some additional functions

Features of Buri

Uses Ansible to provide a collection of roles useful for NetflixOSS work
Allows "installation" of Ubuntu as a foundation AMI
Provides templated approach to role definitions for webapps under Jetty, and JSVC-compatible java daemons
Support for binary or source-based installations from git for most roles
Has its own Aminator-like provisioning, with different strengths and weaknesses
Early version of an Aminator plugin to use Buri as the povisioner is available
Provides off-cloud all-in-one demonstation roles for Flux Capacitor and Netflix RSS Recipes

Differences between Aminator and Buri's AMI generation features

Aminator has better support for running concurrent jobs. Buri has basic protections, but lacks hard locking due to limitations in Ansible
Buri supports the historical no-partition volumes, as well as normal partitioned systems, for all machine types. Aminator does not current support partitioned volumes.
Buri can register all combinations of HVM/PVM machine AMI, and S3/EBS root storage as a part of a single run. Aminator requires seperate jobs to be run.
Using Buri's provisioner directly may be more convenient when developing roles, and Aminator with the plugin used for the "real" production bound generations.

Differences between NetflixOSS-Ansible, and Buri's role library

Some roles are directly carried over and enhanced (Ice, Asgard, Edda)
Buri only currently targets Ubuntu LTS releases, NetflixOSS-Ansible targets Amazon Linux as well
Many new roles in Buri (Exhibitor, Priam, Flux Capacitor demo)
Focus in Buri on both EC2 and local development VM deployment
Buri biases access controls to be handled via IAM, vs. using API keys

Using Buri

Overview

Initial look at demo on Local VM via Vagrant
Configuring Buri for your EC2 environment
Bootstrapping a build environment in EC2
Creating a foundation AMI
Creating a base AMI
Creating a builder role AMI
Creating a other AMI roles

Requirements for local VM / Bootstrap

Ansible 1.6.x
For local VMs: JDK, Oracle VirtualBox, Vagrant, 8+GB RAM on workstation
git

Launch all-in-one Flux Capacitor demo


# checkout Buri
git clone -b develop https://github.com/viafoura/buri
cd buri

# add vagrant plugin requirement
vagrant plugin install vagrant-host-shell

# launch and provision!
vagrant up

Configure Buri for YOUR EC2 environment


# In Buri checkout
mkdir local                    # only needed if you never ran the VM above

# Copy default configurations as starting point
cp -rv etc/inventory local/

# Edit variables for target environment (we will use "test")
vi local/inventory/group_vars/test

# Uncomment the environment line and set default to test:
vi etc/buri.cfg

Account numbers, S3 buckets, are the first things to modify
You should commit local folder to a *private* repository, or manage in some other manner

Bootstrapping a build node

Setup an IAM role using policies/aminator.sample as template (modify S3 bucket reference to match what you created)
Launch an official Ubuntu AMI with the AMI role assigned
From your workstation with Buri:


# In Buri checkout
./buri --environment test buildhost HOSTNAME

# Pre-installed Buri is WIP, ignore for now, copy w/ local folder from workstation
scp -r . ubuntu@HOSTNAME:buri

# login to node and use it from here on
ssh ubuntu@HOSTNAME
cd buri

Create foundation AMI


# From buri folder on bootstrapped host:
sudo ./buri foundation

Make note of the EBS/PVM AMI ID, which is always used for re-snapshotting against an image

Derive base AMI from foundation


# From buri folder on bootstrapped host:
sudo ./buri resnap FOUNDATION-AMI-ID base

Make note of the EBS/PVM AMI ID, which most roles will use as the base to provision upon

Derive builder AMI from base

Recreates same build environment you are using now so it can be directly started
Eventually you want to boot it and shutdown the bootstrap node, but for now you can keep using it


# From buri folder on bootstrapped host:
sudo ./buri resnap BASE-AMI-ID aminator

Using Buri to Deploy Priam

Priam Configuration


# Key variables for Priam:

# Set this true unless in a VPC in a single region
priam_multiregion_enable: true

# How Priam reports cluster members to eachother changes in a VPC
priam_vpc: true

# Ec2MultiRegionSnitch recommended always, unless in a VPC, single region, set to Ec2Snitch
priam_endpoint_snitch: "org.apache.cassandra.locator.Ec2Snitch"

priam_zones_available: "us-east-1a,us-east-1d,us-east-1e"

priam_s3_bucket: "your_s3_bucket/some_optional_path"

Derive Priam AMI from base

Cluster names are special, specified from command line, as they can be used for generating several AMIs with small config difference if you like/need.
Role is special, in that it will also setup the SimpleDB entities if needed, for running Priam, as the image is generated.


# From buri folder on bootstrapped host:
sudo ./buri --cluster-name your-name resnap BASE-AMI-ID priam

Setup S3/IAM role for Priam

Setup an IAM role using policies/priam.sample as template
Modify S3 bucket reference to match what you create for the purpose
Note that you can use same bucket for multiple clusters

Create security group for Priam cluster

Need a security group per cluster name
Names are important, Buri convention is priam-CLUSTER-NAME
All members of the group should be able to talk to all others in group on TCP 1024-65535
Client port access (CQL, Thrift, etc)
Other admin access per your conventions

Create launch configuration for autoscale groups

Launch configuration should specify the Priam AMI, IAM role, security group.
If running in a non-default VPC, ensure assign public IP is selected.
Ensure all the ephemeral storage is activated in the config.

Create per-zone autoscale groups

Each uses same launch configuration
Names MUST be the cluster name, a dash, and the zone name stripped of its dashes
EG: if your cluster is name mycluster, and you are using us-east only, in zones b, d, and e, you would need ASG named:

mycluster-useast1b
mycluster-useast1d
mycluster-useast1e

Only set ASG size to one instance for now, do not set rules for automatic size changes
Once complete, you should see 3 instances launching

What happens as it comes online

If there is more than 1 ephemeral drive available, they will be striped into a single volume on /mnt. Depending on size of storage, this may take some time.
First node Priam comes up on will see no other seeds, and disable auto bootstrap to initialize the first node of the ring.
Subsequent nodes will see seeds available, and auto bootstrap into the ring using those seeds.

What to do from here?

Double the ring as needed to scale cluster (Make REST call, expand ASG size)
Load dummy data with cassandra-stress, specifying replication factor >1.... kill nodes one at a time, observe that replacement comes online, and streams data to itself on joining.
Check S3, which should be getting SSTables backed up in near real-time
Setup Expiration/Glacier policies on S3 bucket

Cassandra and Priam

Using Buri to assist deployment

Welcome

Audience

Shameless Plug

Who am I?

Joe Hohertz

Summary

Let's get started!

Priam

What is Priam?

What operations does it handle?

Limitations

Other challenges

Patches needed to run Priam sucessfully w/ C* 2.0+

2.0+ Streaming API changes

Cluster bootstrap changes

Gossip info REST call fix

Our Priam Fork

Deployment

Use of autoscale groups

SimpleDB for shared configuration

Immutable deployment

Aminator

Layered AMI generation

Buri

What is Buri?

Features of Buri

Differences between Aminator and Buri's AMI generation features

Differences between NetflixOSS-Ansible, and Buri's role library

Using Buri

Overview

Requirements for local VM / Bootstrap

Launch all-in-one Flux Capacitor demo

Configure Buri for YOUR EC2 environment

Bootstrapping a build node

Create foundation AMI

Derive base AMI from foundation

Derive builder AMI from base

Using Buri to Deploy Priam

Priam Configuration

Derive Priam AMI from base

Setup S3/IAM role for Priam

Create security group for Priam cluster

Create launch configuration for autoscale groups

Create per-zone autoscale groups

What happens as it comes online

What to do from here?

Resources:

THE END