Data Sheet: Scalable Website/Application Big Data Solution

Big Data Solution
How-To:
Deploy
a Raw Website/Application
Disk Cloud Server on GoGrid
Data
Sheet:
Scalable
Overview
GoGrid just launched Raw Disk Cloud Servers, the perfect choice for your Hadoop data node. These
purpose-built Cloud Servers run on a redundant 10-Gbps network fabric on the latest Intel Ivy Bridge
processors, which allows both private and public traffic to communicate at up to 10 Gbps and takes
advantage of redundant network hardware. What sets these servers apart, however, is the massive
amount of raw storage in JBOD (Just a Bunch of Disks) configuration. You can deploy up to 45 x 4 TB SAS
disks on 1 Cloud Server.
These servers are designed to serve as Hadoop data nodes, which are typically deployed in a JBOD
configuration. This setup maximizes available storage space on the server and also aids in performance.
There are roughly 2 cores allocated per spindle, giving these servers additional MapReduce processing
power. In addition, these disks aren’t a virtual allocation from a larger device. Each volume is actually a
dedicated, physical 4 TB hard drive, so you get the full drive per volume with no initial write penalty.
You can use our Raw Disk Cloud Servers for uses other than Hadoop, but they should typically be
deployed as part of a cluster. You should at least have an application that is able to handle replication
and failover conditions. As a JBOD, any data on a failed disk is most likely lost if you don’t have some
type of replication or backup.
High-Performance Infrastructure
Raw Disk Cloud Servers maintain the OS disks separately from the data disks. Only the data disks are
configured as JBODS, so any failures on the data disks have no impact on the OS. The JBODs are also
direct attached so there is no difference between the Raw Disk Cloud Servers and similar Dedicated
Servers with JBOD. The storage is not apportioned but rather is a dedicated, physical SAS disk for each
volume. For the X-Large Raw Disk Cloud Server, for example, there are 3 physical 4-TB disks attached to
the server. Raw Disk Cloud Servers currently support only Linux. The disks will be attached, but you need
to lay down a file system and mount the drive.
Step 1: Select a Raw Disk Cloud Server
You can deploy a Raw Disk Cloud Server from GoGrid’s management console or through an API call.
From the management console, use the Add button and select the “Cloud Server” option. Make sure
that you’re in the US-West-1 data center because that’s the only location that currently supports Raw
Disk. You’ll be presented with an image selector; select any Linux 64-bit OS. You’re then presented with
some options for your Cloud Server.
There is a drop-down called “Server Flavor” with the following options: “All,” “Standard,” “SSD,” and
“Raw.” This is a filter for the “Server Size” drop-down. If you select “Raw,” then you’ll only see Raw Disk
Cloud Server options under “Server Size.” Select the Cloud Server size you’re interested in and hit the
Next button to select your subscription term and deploy your Cloud Server.
© Copyright 2014 GoGrid. All rights reserved. Various trademarks held by their respective owners.
Step 2: Configure Your JBOD Disks
The Raw Disk Cloud Servers use a different disk for the OS files (including root and swap). They’re not on
the JBODs, so you only need to use those raw disks to store data. To find your volumes, run fdisk –l. You
should see them attached as devices. They will appear as 4-TB devices because all the attached raw
disks are that size. It most cases, the first volume will be called “/dev/xvdfa” and each volume will be an
iteration of that.
Disk /dev/xvdfa: 4000.8 GB, 4000787030016 bytes
255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/xvdfa doesn’t contain a valid partition table
You’ll see an entry similar to this for all disks attached to your Cloud Server.
You have the option of creating a partition, but doing so isn’t required. If you want to use a partition,
then you’ll need to use GNU parted if you want it larger than 2 TB. Otherwise, you can just format the
disk directly. If you’re using these Cloud Servers for Hadoop, ext3 has been extensively tested (it’s been
publicly tested on Yahoo’s cluster), but ext4 should also work (and should have better performance with
large files).
mkfs.ext4 /dev/xvdfa
Step 3: Mount the Drive
You’ll need to create a new location for the new drive on the file system. For example, you can create a
directory called “mydisk1” and enter mkdir /mydisk1 at the prompt. Once you’ve created the directory,
you can then mount your disk:
mount /dev/xvdfa /mydisk1
© Copyright 2014 GoGrid. All rights reserved. Various trademarks held by their respective owners.
You should now be able to read and write files in your mydisk1 directory. If you run df- h then you’ll see
your drive and the mydisk1 mount point.
Step 4: Make the Drive Permanent
The steps above are core to getting the new device up and running. If you want the drive to mount
automatically following reboots, however, you’ll need to add a line to your “/etc/fstab” file.
/dev/xvdfa /mydisk1 ext4 defaults,nobootwait,noatime 0 0
This is a slight change from the typical “fstab” entry: nobootwait prevents Linux from stalling the boot if
the share doesn’t exist. “0 0” means no automatic backup (if activated on your Cloud Server) and no
automatic file system check. If you leave both of these options turned on, it will cause the Cloud Server
to stall. Noatime prevents reads from turning into unnecessary writes, which helps improve
performance (this is an optional setting, typically recommended for Hadoop).
Reboot and verify that you still see the drive and mount point. The easiest way to do so is to run df –h. It
will look like this:
Filesystem
/dev/xvda2
udev
tmpfs
none
none
/dev/xvda1
/dev/xvdfa
Size
36G
7.8G
3.2G
5.0M
7.9G
184M
3.6T
Used Avail Use% Mounted on
1.3G
33G
4% /
12K 7.8G
1% /dev
224K 3.2G
1% /run
0 5.0M
0% /run/lock
0 7.9G
0% /run/shm
42M 133M 25% /boot
196M 3.4T
1% /mydisk1
Step 5: Start Storing Stuff!
Now that you’ve mounted your drive, you can start using it as a data node for your Hadoop cluster. You
can deploy any distribution of Hadoop that you prefer or you can wait until we release our 1-Button
Deploy of HBase. You can also use these nodes for a large disk array. You’ll want some sort of software
that can manage replication or you can configure software RAID on your server. Either way, you’ll want
to have multiple servers to protect against failure. GoGrid is committed to releasing infrastructure that
is designed to support Big Data applications, and you can expect to see more applications and
infrastructure options coming soon!
This article is based on a blog post by Rupert Tagnipes.
About GoGrid
GoGrid enables companies to evaluate and run multiple, on-demand big data solutions quickly, simply, reliably,
securely, and cost-effectively. As the leader in Open Data Services (ODS), GoGrid is committed to delivering
purpose-built Big Data solutions and services for the management and integration of open source, commercial,
and proprietary technologies across multiple platforms. With over 15,000 customers and over 600,000 VMs
deployed, GoGrid has pioneered cloud infrastructure for more than a decade for companies like Condé Nast,
Merkle, and Preventice. For more information, please visit www.GoGrid.com.
© Copyright 2014 GoGrid. All rights reserved. Various trademarks held by their respective owners.
HT_Deploy-Raw-Disk-CS_20140130