Wow, we are at the 1 year mark of having Cisco UCS in our environment which is now our standard, what a year it has been. I was fortunate enough to present to a lot of people at a few conferences, sit on some expert panel discussions as well as the Unified Compute Advisory Board and talk to a lot of new customers during their investigation periods into UCS. It has been satisfying to hear the majority of reference customers I have talked with decided to go with Cisco UCS. There are even a few that are blogging about it!
I figured it is time to update everyone on the current configuration build we are using. I think back to when we started with VMware on 3.5 and how much more complex it all seems now but with that complexity we all have gained greater control, cost savings and agility.
Cisco UCS B200 M1 or M2 blade with 2 CPU, 96 GB memory, Cisco VIC (Palo) interface card.
Cisco Nexus 1000v distributed virtual switch.
EMC PowerPath/VE for fiber channel multi-pathing.
VMware 4.1 Enterprise Plus (not yet using 4.1i, but soon) with Enhanced vMotion enabled.
So what has changed for us since last December when we went into production on UCS? Well the new technology that Cisco, VMware and EMC keep creating and it all still fits and works together.
The release of the M2 blade brought with it the Intel Westmere CPU and 6 cores. For about the same price point we were able to add 50% more processing power in a half size blade. Then by enabling Enhanced vMotion in our VMware cluster these new M2 blades work seamlessly with the M1 blades.
Cisco VIC (Palo) mezzanine interface card was released and has provided us with a more flexible means to implement converged networking at the host level. There are a few things you can do with the Virtual Interface Card but one of the main advantages we have incorporated is carving out additional Ethernet interfaces on our ESX hosts. So what, you might say? For example, we created 2 NICs for ESX management which reside outside of the virtual distributed switch that services our guest traffic. This simplifies our build process and allows us to control and manage the host if there is an issue with the vDistributed Switch.
Cisco Nexus 1000v has been around for a little while but we have now implemented it to really bring the networking side of the virtual environment out to the control of our network engineers. As our environment has grown the desire and need to have visibility into our guest server traffic has increased. The N1KV has already been helpful in troubleshooting a few issues and we are likely to be implementing some QoS in the near future. Note, when you pair the QoS functions within UCS, the VIC and Nexus 1000v you have a very powerful tool for implementing your key compute functions in a virtual environment. You can garentee your service levels and have confidence in the implementation.
EMC PowerPath/VE development has continued and our driver issue with the VIC has been resolved for a while. The coolest new thing here is on the ESXi front, PP/VE is now supported with boot from SAN (that will be are next step moving forward).
VMware ESX 4.1 & 4.1i keeps us current on the newest tools and optimizations.
As you know IT and technology is very dynamic and we are already planning on changing things within 60 days by going to an ESXi build with boot from SAN, implementing new UCS 1.4 code so we can implement may of the new UCS features which include the new B230 blades with 16 cores and 256 GB memory all in a half size blade. Yes all this within 60 days. I can’t wait to see the workloads the B230 will handle. Oh, and we will also throw in a new EMC VMAX to push the performance level even higher.
IT in healthcare has a growing demand for performance, agility and uptime and the above technologies are what will allow organizations to handle the changes. Hang on tight it is going to be a fun ride.
We have been running Cisco UCS 4 months now and are preparing for a code upgrade and adding more B200 blades to the system for VMware. So I was thinking what do I really have running in production on the system at this point? It makes sense to have a good handle on this as part of our code upgrade prep work. I put together the below information and figured others could find it useful to get a perspective of what is in production on UCS in the real world (note all of the blades refer to the B200 model running with the Emulex card).
Windows 2008 and 2003 Servers:
I will start with a cool one. Tuesday we went live with our VMware vCenter server loaded bare metal on a UCS blade with boot from SAN. This is W2K8 64 bit, vCenter 2.x with update manager and running SQL 2008 64 bit database (used by vCenter). It has 1 Nehalem 4 core CPU and 12 GB of memory and is running sweet. This is a big show of trust in UCS, the center of my VMware world running on it for the enterprise!
2 server blades boot from SAN (1 prod and 1 test) running W2K3 64 bit with Oracle ver. 10G for our document management system. It has 1 Nehalem 4 core CPU and 48 GB of memory and is running with no issues.
VMware ESX Hosts:
4 production VMware ESX 4.0 hosts with NO EMC PowerPath/VE. All boot from SAN, 2 – 4 Core CPU and 48 GB memory. These 4 ESX servers are configured to optimally support W2K8 64 bit Microsoft clusters. We currently are running 4 – 2 node MS clusters on these blades. They are using about 37% of the memory and not really touching the CPU, so we could easily double the number of MS clusters over time on these blades.
10 production VMware ESX 4.0 hosts with EMC PowerPath/VE. All boot from SAN, 2 – 4 Core CPU and 96 GB memory. Today we have 87 guest servers running on our UCS VMware cluster. This number increases daily. We are preparing for a few application go-lives that use Citrix XenApp to access the application, so we have another 47 of these servers built and ready to be turned on by the end of the month. So we should have well over 127 guest servers running by then on the UCS VMware cluster.
Here is a summary of the types of production applications/workloads that are up the current 87 guest servers:
NOTE: For the 10 guest servers listed below for data warehouse, they are very heavy on memory (3 with 64 GB, etc.) and we have hard allocated this memory to the guest servers. Meaning the guest is assigned and allocated all 64 GB of memory on boot, even if it is not using it. So, for these large servers they are really using memory resources in VMWare differently than what you normally would do within the shared memory function of VMWare.
10 servers running data warehouse app; 5 heavy SQL 2008 64 bit servers with the rest being web and interfaces.
15 servers for Document Management servers running W2K3 server including IBM Websphere.
39 W2K3 64 bit server running Citrix XenApp 4.5 in production delivering our enterprise applications. The combination of these servers is probably handling applications for about 400 concurrent production users. This will be increasing significantly within 21 days with coming go-lives.
7 W2K8 64 bit servers that provide core Citrix XenApp DB function (SQL 2008) and Citrix Provisioning servers for XenApp servers.
1 W2K3 server running SQL 2005 for computer based learning; production for enterprise.
1 W2K3 server running SQL 2005 for production enterprise staff scheduling system.
3 W2K3 servers running general production applications (shared servers for lower end type apps).
3 W2K3 servers running interface processors for the surgical (OR) application (deals with things like collar-bone surgeries )
1 W2K3 server running a key finance application.
1 W2K3 server running a key pharmacy application.
1 W2K8 server running a pilot SharePoint site (the free version).
There are a few other misc guest servers running as well for various lower end functions, i.e., web servers, etc.
Current VMware Utilization:
Average CPU utilization in the UCS Cluster for the 10 hosts is 8.8%.
The 3 ESX hosts running guest servers with hard allocated 64 GB memory: 76% average.
The 7 ESX hosts running all other workloads: 41% average.
We still have a good amount of growth within our UCS Cluster with 10 servers. I believe I could run this full load on 6 blade servers if I had to for a short period of time.
There you have it, a good summary of what a production Cisco UCS installation looks like in the real world.
Yesterday we finished our 3 week implementation with our Cisco Advanced Service’s engineer. We had started the process with a good size list of tasks we wanted to accomplish and test items. I believe we covered all of the configuration items and now have working service profiles, policies, pools, etc. the way we want. We also completed all of the hardware related testing that we could come up with. Things like turn off the primary 6120 and watch it failover, took away an Ethernet path, Fiber Channel path, powered off a chassis and powered off a blade. All things tended to react and be handle as expected. The errors and warnings were logged in the manager, generally made sense and cleared up once the “failure” was resolved.
We have a server policy for boot sequence to use boot from SAN and then go to PXE LAN boot if the SAN LUN does not exist. Our build process for a new ESX host on a UCS blade calls for it to not see a SAN LUN (this is to be expected since there is no OS yet), so it goes to the next boot order, in this case, PXE boot. The PXE boot’s to our UDA (ultimate deployment appliance) and loads the ESX OS image. After the ESX is loaded it reboots and does not fully see the SAN LUN due to some remaining steps. Our work around is to associate the “almost” built ESX host to another blade where it completes the process successfully. Strange steps to figure out at first.
vCenter Host Profiles: We discovered an issue with using Host Profiles in vCenter when trying to apply them to an ESX host on a UCS blade with the 10 G CNA adapter. The problem turned out to be the port speed option was set to auto negotiate and the CNA does not have an auto option. This was resolved by hard setting the NICs in the Host Profile to 10 G and now it functions as expected.
vCenter Distributed Switch: Our 2U servers have 6 to 8 - 1 GB NICs that we use for connectivity for each ESX host. On the UCS blades we go to only 2 – 10 G NICs. This means we need to have 1 distributed switch for our 2U servers and 1 for our UCS servers. Not a real big deal, but something we had to work out. I expect with the Palo CNA you could logical configure your blade I/O to appear to be just like your 2U server configuration, however, I am not sure you would want to do it that way.
Is UCS a more or less complex way to provide compute power?
Early in our implementation process two of our engineers were debating this question. Yes, there are more “things” to set/define for a service profile (server) which makes it seem more complex. On the other hand, I would say the additional “things” are not really new, they are things you had to deal with in traditional blades or servers, it was just done in a non-central manner. In addition, once you have defined your configurations and standards using the UCS tools such as, pools, policies and service profiles you end up with a more consistent and easier to manage environment. So I think the complexity is really in the initial process of thinking through how to best utilize the pools, policies and service profiles. This takes some time to define with the correct people in your organization. Meaning your storage person needs to understand how defining your own WWPN can streamline your zoning and registration process. Once these types of functions are laid out, the complexity is minimal.
How long did implementation really take?
I would say racking, stacking, cabling and powering up the system was accomplished over a 2 day period.
Once the Cisco implementation engineer arrived we spent 1 day going over concepts, scenarios and some white boarding.
We jumped into about 2-3 days of “playing around” in the system creating pools, messing with policies, creating service profiles, etc. This was our hands-on training process with about 3 or 4 team members. During this time is when we configured the “northbound” LAN connections to our 6500 switch which took a little extra time due to the confusion with different terms used for “native” vlan. We also took care of the SAN connectivity to the Cisco MDS switches, very straight forward.
The next 2 to 3 days was used to clean up our sandbox configurations and build our production configuration for pools, policies, templates, etc. We also determined we would move to boot from SAN for all UCS blades to take full advantage of the stateless functionality. For organizations already familiar with boot from SAN on a Clariion for an ESX enviroment this would eliminate 1 day of work.
1 day was used to test all of our hardware failure scenarios that we had come up with; pull out all 8 fans, pull out 3 of 4 power supplies, kill the power to a chassis, etc.
By the end of the second week our focus shifted from how to build and use UCS to actually building our production ESX hosts with our new processes. I would say 2 of the days were heavy on trial and error and determining the best process within UCS Manager, vCenter and Navisphere. The remainder of the week was normal vCenter work building ESX hosts and VMware guests.
– 2 days spent before implementation kickoff
– 5 days used for concepts, training, sandbox/hardware testing and architecting the environment
– 3-4 days configuring, building and working out processes for the production environment
– 4 days of performing our vCenter VMware related build processes for the guest servers (we are also doing Microsoft clustering with guests under VMware which has its own restrictions and requirements).
The UCS system met or exceeded my expectations. Terms that come to mind that describe UCS are simplified management, elegant design, paradigm shift, future of computing, time and cost saver, etc. Cisco got it right; I look forward to the changes brought by UCS.
Implementation of VMware ESX for server virtualization in our environment changed the way we function in a very positive way. I see the Cisco UCS having a very similar positive impact on our organization. It expands on our VMware environment by reducing physical complexity, streamlining the SAN and LAN configurations, reduces cabling and required I/O connectivity infrastructure. It adds additional flexibility with further abstraction of the server hardware. I see the use of UCS outside of VMware ESX hosts as a big step forward as well. By using boot from SAN and service profiles, we now have the ability to easily move non-VMware workloads to different hardware blades in the same UCS or to a secondary site (using future functionality in BMC software, etc.). This functionality will be huge and will be a UCS growth area for us.
Short-term plans are to grow UCS to additional chassis’s to support immediate projects, add a blade to run my vCenter server (W2K8 directly on the blade), and move all existing UCS blades running ESX to the new Palo card when it becomes available.
Long term, using UCS for a secondary datacenter would make a lot of sense by reducing the complexity and increasing our flexibility. This can be accomplish by using VMware SRM as well as the functionality of the UCS Service Profiles.
Check out Cisco UCS if you get a chance it is well worth the time.
Server virtualization has been great for our organization since we implemented VMWare 3.x over 2 years ago. We have grown our VMWare environment to 24 hosts with over 300 guest servers and upgraded recently to vSphere 4.o.
The server hardware we have been using for our ESX hosts has been 2U rack mounted servers with 2 – quad-core CPU, 32 GB or 96 GB memory, 2 HBA, and 6 to 8 – 1 GB Ethernet NICs.
Recently we had a project need to add 16 additional ESX hosts to our environment. At the same time I began to learn about Cisco’s new blade server system the UCS (unified computing system). My first impression was confusion from what I had been reading in the trade rags. Then I had a presentation from Cisco on the topic and I was intrigued when it was compared to a SAN but for compute capacity. Meaning you have 2 “controllers” and you add blades and chassis’ for additional capacity.
At VMWorld 2009 I was able to dig deeper into UCS which was center stage at the conference. When you entered Mosecone Conference Center the datacenter that supported the conference was 512 Cisco UCS blades located in the main lobby for all to see. Cisco had the right people on-hand to educate me on the what, how and why the UCS system was built.
So for my project that requires 16 more ESX hosts I did a deep comparison of my traditional approach with converged 10 G network (2U rack mount) vs. a Cisco UCS configuration. Looking at the cost differences, what it takes to grow both approaches and the amount of required infrastructure (Ethernet and Fiber Channel cabling and ports) Cisco UCS made the most sense. There is some risk going with a new server “system”, however my previous experience with Cisco has always been very positive.
We have jump in with both feet with Cisco UCS . . . I will bring you along for the ride.