Adventures in colocation

In a previous post in had mentioned that everyone should run their own webserver and gave some basic opinions on how to pull that off. Soon after that I also wrote another post about my overly complicated web hosting setup. Since then things have gone completely off the rails. So much so that I decided to break this into at least two posts, maybe more depending on my level of laziness.

So where do I start? I my first set up was a dell r410 running vmware and a bunch of vms. It worked great but I wouldn’t be a true geek if I didn’t decide to up the ante a bit more. So I exercised my eBay skills and managed to get my hands on a dell 1950 II, a Hp proliant gl360 g4, a dell 1250 and a supermicro server that I can’t remember the model number.

Soon after I expanded my colo space to 10RU and was ready to hit thr ground running. I even managed to get a buddy of mine to split the cost with me. So we installed xenserver on two of the servers, pfsense on the supermicro and openfiler on the dell 1250. We built a few vms and left things to bake for a while.

Then the shit hit the fan. We started getting tons of pingdom and new relic alerts for down hosts. Part of the problem is the colo was not in a real space designed for the heavy electrical usage that comes with a data center. Basically causing tons of breaker trips. Side note never a good sign to see this in a data center providing retail colocation.

the fan keeping the breaker box cool

Then we were informed by the provider that our gear was causing issues with other customers on the vlan and they decided to disconnect the uplink to my network. I spent about two weeks working with them and they eventually moved me to my own vlan and i took the trip to the facility to do some long awaited maintenance. I got the network up. My firewall was happy but my hypervisor hosts were not. Why? The storage array was not reachable. Wtf?! I grabbed a crash cart looked at my storage host and was greeted buy the raid controller bios complaining about the array being off line.. Offline! All the work we had done before went right out the freaking window! As it turns out, and i knew his but never thought about it, raid arrays… They don’t release like a lot of power disruptions especially during write operations.

After taking a ling walk and lots of deep breaths. I began the task of rebuilding the array and doing all the other crap needed to set the storage up to use as nfs storage and setting up the iscsi block store. Took me about 16 hours to get all the maintenance and rebuilding done. Everything was up and running, my friend was able to vpn in and everything was all good… Wrong!

The network was down again before I made it back to nyc. My best guess was because the colo lost power again, I figured this out based on the fact that the asn of the provider as not visible in the global bgp table. (. I checked the looking glass page for about 3 large backbone isps).

So after all that work. What was I left with? A crap load of money and time spent to have a 30 day uptime of 10 freaking percent.

screenshot of pingdom

More to come stay tuned..