I am geeked! We just completed the code
upgrade on our production Cisco UCS environment and it was
We have been in production on Cisco UCS for 1
year and 22 days now and have ran on 1.1, 1.2, 1.3 and now 1.4
code. So today was our 3rd code upgrade process on a
production environment and each time things have gotten better and
cleaner. Why am I so excited? Think about it . . . with UCS
there are 2 Fabric Interconnects with some chassis hanging off of
them with a bunch of servers all using a single point of
management. Everything is connected with redundancy and if
all of that redundancy is operational and live you truly can reboot
half your IO fabric and not drop a ping or storage connection.
In a storage world this is standard and expected but in a
server blade world you would think to accomplish the same level of
high availability and uptime provided by a SAN, there
would have to be a lot of complexity. Enter Cisco UCS! An
hour ago we upgraded and rebooted half our IO infrastructure that
serves over 208 production VM Guest Servers running on 21 VMware
ESX hosts and another 8 Windows server (all running active SQL or
Oracle databases) blades without dropping a packet. Then I
did the same thing to the other IO infrastructure path with NO
ISSUES. This is just badass. I suspect in a year this
type of redundancy and HA level in a x86 server
environment will be an expectation and not an exception.
UCS Code Upgrade Experiences:
In March 2010 we
performed the first upgrade while in production to 1.2 code (you
can check out my blog post for all the details). The major
impact we experienced with this one was due to a human issue; we
forgot to enable spanning tree port fast for the EtherChannels
connecting our Fabric Interconnects. Our fault, issue fixed,
move on. In December 2010 we implemented 1.3 code for a few reasons
mainly related to ESX 4.1 and Nexus 1K. Our only issue here
was with 1 Windows 2003 64-bit server running on a B200 blade with
OS NIC Teaming which failed to work correctly. Again,
not a UCS code issue but a server OS teaming issue. We had 3
servers using NIC Teaming in the OS, so we decided to change these
servers to hardware failover mode provided in UCS instead of in the
OS. Changes made, ready to move on. It just so happened on
the same day we did the 1.3 upgrade Cisco released 1.4 code just in
time for Christmas (thanks SAVBU). This time we had all our
bases covered and each step worked as expected; no spanning tree
issues, no OS NIC Teaming problems, it was smooth! There was
some risk with moving to the new code so fast, but we have several
projects that are needing the new B230 blades ASAP. There are
several UCS users and partners that have already been going through
1.4 testing and things have been looking very good. Thanks to
all who provided me with feedback over the last week.
Features and Functions:
Now we get to dig into all the
new cool and functional features in the new code. I am
impressed already. I will put together a separate posts with
my first impressions. I do want to point out one key thing that I
referenced above; the need to upgrade the infrastructure to use new
hardware (B230 blades). Now that I am on 1.4 code this
requirement is gone. Yep, with 1.4 code, they have made
changes that will NOT require a upgrade of the IO infrastructure
(Fabric Interconnects and UCS Manager) to use new hardware like a
B230. So yes, things are sweet with Cisco UCS and it just got
The second day started with my favorite session so far and I did not expect it. I am not a programmer but I thought the UCS XML API session was great. I have heard a few times that all functionality to manage UCS via the UCS Manager, the XML API or the CLI is full featured but never dug deeper. Catherine Liao gave us 15 minute primer on XML, XML Schema, etc. to get us started. Then we got into a few ways you can code to the XML API and see API response and error handing. What brought out the geek in me was talking about different use cases. Here are the notes from that section:
1. Display System: You could use this type of example to generate a report of service profiles, etc. and you could include firmware version.
2. Mobile System Monitor, they have a beta iPhone app that uses the API. Took him about 80 hours to develop it. Check it out at http://www.timeline.com/simu
3. Auto provisioning of systems like what Cisco IT is using to deploy workloads.
The other highlight of the day was the keynote with John Chambers. I always like to hear him speak, he knows how to communicate his ideas very well to a crowd. The big WOW from the session was the announcement of the new Cisco Cius “See-Us” tablet device. The first reaction is that it is a competitor to the iPad but it really is focused as a collaboration tool using unified communication (with “Video as the new Voice”) being a key piece of it. Plus it is focused as a business, education or communication device. Afterward when everyone was leaving the event I noticed John Chambers still hanging out and talking with people. So I headed down and had a few minutes to chat with him and get a picture.
It was a pretty cool day that ended on the patio at the top of Mandalay Bay looking out on the strip listening to great live music from Phat Strad. Thanks to the Cisco datacenter group for organizing this fun event.
I am a little late on my day one blog but here it is . . . The focus of my first day was around the core network in the datacenter. We have Cisco UCS so I attended a few sessions on FCoE and the Nexus 7000 hardware.
FCoE covered basic concepts for the first half and then they dug into the details of how FCoE is really FC but contained in an Ethernet packet. So it was a good refresher on FC concepts and how things map under FCoE. We also got an understanding of how the FCoE protocol has been built to provide the lossless nature needed to support fiber channel traffic by using flow control. During this session I wanted to login to one of my FC Cisco MDS switch to check out some show information as we covered them in the session. Since I am only using my iPad this week I used the Citrix Receiver app to remote to my desktop. It worked very well, pretty cool.
For the Nexus 7000 hardware session I now know the basics of how it is structured. Since my experience is with the 6500 what stood out for me was the fabric for the switch is not on the supervisor but on it’s own fabric modules that go in the back of the chassis. This allows for up to 5 fabric modules and up to 230 Gbps of bandwidth. Based on your needs you can start small (3 is the base recommendation for redundancy and adequate capacity) and add to it over time. Today Cisco announced a new module line called the F Series that will allow you to add IO modules that will be able to start to reach that 230 gbps of bandwidth. Another interesting thing from today is next year there will be a 9 slot Nexus 7000 as well as new fabric modules that will get the bandwidth up to 560 gbps (I am going on memory for that number).
I shifted gears in the afternoon and attended an IT Management session from Cisco’s Paul McNab focused on IT Aligning to Business Strategy. This was pretty cool discussion that gave insight into how Cisco has approached disrupters as opportunities and how using collaboration to increase speed to respond and broaden your scope. An example that stood out for me was that the value of the technology we purchase today may only have 18 months of life, this means IT needs to get value out of that investment very fast or you will loss your competitive footing. By using new tools and technologies such as collaboration tools, unified computing, etc. to maximize your time and resources are ways IT can bring value to the business. This session is one I would not have expected to see at a tech conference, however, it fits well into my interests.
Finished out the day with the show floor, World of Solutions. It has been a few years since attending a Cisco conference and it reminded me how diverse Cisco is. You see cabling, storage, data center, server components and voice vendors all in one room. We had a great first day and look forward to the rest of the week.
I am on a plane headed to Las Vegas for this year’s Cisco Live conference. It has been a few years since I last attended Cisco Networkers back before it evolved and expanded just like Cisco and the rest of IT has changed. Back then I was interested in wireless, network security and just getting an understanding of storage area networks and the MDS line of gear.
Today I continue to have a heavy focus on the importance of the datacenter but there is a diverse and wide variety of technologies that all have to come together for a health system to deliver exceptional patient care.
It is time again for our organization to refresh our wireless to support greater bandwidth by using 802.11n, provide for a substantial increase in the number of wireless devices and simplify the management of it.
How we look at our LAN moving forward will need to expand the support of Quality of Service (QoS) to handle more video and voice. How it will all come together will be key as we move forward. Just like we are building in high availability and redundancy in the datacenter we have to focus similar attention on the network. This is the northbound traffic heading out of the datacenter to all of the end node devices no matter what, where and how they are connected. All of this done in a secure manner, of course.
Back in the datacenter, key technologies that are interesting to me are the new things coming in the Nexus product line and how it will tie in with UCS. So as we are expanding and scaling out our compute capacity on UCS what is going to be the most efficient and cost effective way to deliver this high level of service.
I am looking forward to this week and it should be a good time as well. I always like to hear John Chamber’s keynote, I wonder what his focus will be this year. Cloud Computing?
A common question I get when talking with others about our Cisco UCS production environment is if we have had any issues that required us to deal with Cisco TAC. Like with anything we have had a few things that required a call. By the way, the phone number is the same for any Cisco product. Here are a few examples for you.
One of our first calls had to do with a failed 8 GB DIMM in one B200 M1 server blade. We noticed a warning light on the blade and went to the UCS Manager to investigate. We were able to quickly drill down to the effected blade’s inventory and go to the memory tab. This screen provided the details of the failed DIMM’s slot location and confirmed it’s failed status. Since the workload running on this blade was VMware ESX we put it into maintenance mode, powered down the blade and replaced the DIMM with a spare. It was time to open a ticket with TAC.
The TAC engineer took down our information and sent out a replacement DIMM within 4 hours and we were done with the ticket. I asked our server person what he thought of dealing with TAC and he did not expect it to be that easy. Typically in the past with other server vendors we would have had to run a diagnostic tool to determine which DIMM and then open a trouble ticket. We would have to down the server, re-seat the DIMM, and wait for it to fail again. Once it failed again then we would get a replacement. So this call process with Cisco seemed to be smoother.
Another trouble ticket was related to a VMware ESX host that, post a reboot, would not see the boot partition. After some troubleshooting, it clearly was a ESX OS issue and our VMware admin was ready to re-image the server. However, we thought this would be a good test for Cisco TAC so we opened a ticket. We were surprised when TAC gave the case to an ESX server person at Cisco who within 20 minutes had resolved the issue and the server was back in production. So our expectations were exceeded again.
The one trouble ticket that took sometime was when we wanted to install Windows 2003 standard 64 bit bare metal on a blade with the Emulex interface card. This is easy to do with Windows 2008, however the challenge was getting the right drivers on a media type that the Windows 2003 installation process could recognize. It wanted to see the drivers on a either a CD or floppy disk which you provided by emulating the media. I personally did not work this ticket but it took time over 3 days to get everything completed. In the end we now have a process down and 2 servers in production.
Overall, Cisco has exceeded our expectations when it comes to dealing with trouble tickets around the UCS products successfully. It has been clear to us that Cisco has put the resources into support and have the right folks in place to deal with a variety of potential issues customers may run into.
We have been spending some time cleaning up in the datacenter pulling out all of the old server hardware that is left over from migrations to a virtual environment. In this most recent round of cleanup, there are over 60 old physical servers in these stacks which provided a lot of compute cycles for us in the past. Their time has come to an end. And to think those 60 workloads can easily run on 2 Cisco UCS B200-M1 blade now with VMware ESX 4.x and EMC PowerPath/VE!
Next week is EMC World 2010 in Boston and I am fortunate enough to be attending and presenting. If you are there and want to check out the following presentations on Cisco UCS in a production healthcare environment:
Monday May 10, 11:10 AM to 11:25 AM: Cisco booth Theater Presentation: Cisco UCS Solving Business Challenges: - Moses Cone Health System
Tuesday May 11, 12:25 PM to 12:40 PM: Cisco booth Theater Presentation: Cisco UCS Solving Business Challenges: - Moses Cone Health System
Wednesday May 12, 3:30 PM to 4:30 PM: General Session: Implementing Cisco Data Center 3.0: Cisco IT and Moses Cone Health System
The 2 sessions in the Cisco booth theater will be a quick overview of our UCS experience and then the full session on Wednesday will be done in conjunction with Sidney from Cisco. That will be focused on how Cisco IT has implemented and benefited from using Cisco UCS. Then I will speak about our experience in more details.
Hope to see some of you there.
Well that was a great day! When I was invited to represent my organization at the Cisco Datacenter Launch to talk about our UCS experience I was humbled, excited and nervous. However, it is not often in someone’s career to have the opportunity to be included on a panel with such innovative leaders in the technology industry as David Lawler, Soni Jiandani, Boyd Davis and Ben Gibson. Everyone was down to earth, personable and very comfortable to work with on the panel. The goal was to make the event a relaxed discussion and the point of view of the customer was truly important to the panel and Cisco. I was also amazed at how many people pull the details together for an event of this nature. Lynn, Janne and Marsha were great making sure I was prepared and helped make everything go off smoothly.
This customer focus continued to be evident after the video was completed. I was able to spend the rest of the day with many key individuals, who made time for me, from the UCS business unit. We had some deep technical discussions on various topics like firmware upgrades, wish lists, directions, ease of use vs. levels of control, etc. I was asked by everyone for input regarding ways to improve as well as talking about how we are using the system.
To end my day on the Cisco campus, David Lawyer invited me to his office to meet with him and Mario Mazzola, Senior Vice President of Server and Virtualization Business Unit (SAVBU). Mario has been a key technology person in Silicon Valley, leading the creation of the 6500 switch product, the Cisco MDS fiber channel product and now the Cisco UCS platform (along with many other accomplishments). I think it is fair to say he is a legend in the industry (however, my impression is he is very humble and quickly acknowledge’s others for their contributions to the projects). We had a conversation focused on the customer views related to the product and how it is Cisco’s goal to continual improve the system. Mario and David are very down to earth people and it was clear to me Cisco is very customer focus from the top of the organization down.
So that’s a wrap for this trip to Cisco in San Jose for now . . .
Cool things are happening . . . check out the site for Cisco Datacenter 3.0.
My organization was invited to participate in a customer case study around the success we have had with Cisco UCS in a production environment. The written and video case study are now available on Cisco’s web site and is featured on the main page that outlines the new addition and innovations coming for the datacenter.
Tomorrow, April 6, 2010 there will be the offical launch at 1 PM EST and can be viewed live online (link to register is above). I have been invited to participate in the event and am honored to represent my organizaiton with Cisco. So check it out, there are some very cool things coming right around the corner.
Disclaimer: The views and opinions expressed on this blog are my own and are not endorsed by any person, vendor, or my employer. This is to say the stuff on my blog are done not as an employee but as just another healthcare IT guy.
Working in a hospital environment within the IT group you tend not to have much contact with patients. Your focus tends to be on what is the best technology to get the job done in a cost-effective manner. I personally try to think about a family member or friend being in the hospital that will be dependent on the decisions and directions we set in IT. Case and point, in selecting Cisco UCS to run a significant amount of our clinical applications, I had to have enough trust in the system that it would function correctly.
I broke my collar-bone 5 weeks ago and it finally needed to be surgically repaired on Tuesday. Yesterday as I was looking at the x-ray with the new plate and 6 screws I realized that several servers that make up our Surgical Application System is running live on Cisco UCS. Meaning I trust Cisco UCS so much that I had no concerns having surgery using systems running on Cisco UCS. In fact, because of my knowledge of UCS, my comfort level was higher because it is more redundant and flexible.