Wednesday, March 24, 2010

Locked ESX Virtual Machines

Here is the scenario.  Power outage at 2am.  The outage lasted longer than our battery life.  We have in our roadmap plans to implement scripts to do graceful shutdowns when a low battery signal comes from the UPS, but for the time being, we do not have that.

We initially had some fun bringing everything back up in order.  It gets particularly fun when your AD Domain Controllers are all virtual, DNS is all virtual, and DHCP is virtual.  Get some nice little chicken and egg issues, but we have learned our lesson and are going to create a DNS, DHCP, and DC that are physical, so they can come up before the virtual environment.

The real meat of our problem was some of the virtual machines.  It wasn't directly related to the power outage either.  Our core switches, which are on separate UPS and did not lose power, decided to go Tango Uniform right after we got most of the boxes back up.  Unfortunately, our Netgear core switches are not sending logs to a syslog server and the log files do not persist a reboot (I know, I think its stupid too).  This means we don't have any way of knowing why they went stupid on us.

So, we have learned some lessons and moved on.  On to what I want this article to reflect.  When we lost the switches, the virtual machines lost their connection to their vmdk's.  We use NFS to connect to the datastore, so when we restored the switches, most of the virtual machines just flushed their writes and went on their merry way.  Some virtual machines, however, did not do this.  I inspected the vmware.log file stored with the virtual machine files to see what happened, and I noticed that on all of the virtual machines that were locked up (most of which were Windows XP boxes) had the following log message:

Mar 23 13:34:38.731: vmx| VMX has left the building: 0.

So, we have determined that VMWare just gave up on trying to talk to its VMDK file after some amount of time and the VMX decided to ditch this party.   Ok, so here is the procedure I had to go through to get the darn things back.

First, we need to get the virtual machine into an 'off' state.  This is not easy, nor is it intuitive.  What I had to do was the following.

From the service console of the ESX Server running the VM, find the vmid of the virtual machine in question.  To do so, run the following command and grep for the Virtual Machine name,

cat /proc/vmware/vm/*/names | grep vdi-rivey

The return should  start with vmid=####.  Take this number into the next command, where we are looking for the VM Group.

less -S /proc/vmware/vm/####/cpu/status

You are looking for the vmgroup, which will look something like vm.####.  Next, using the VM group ID number, feed it into the following command to run an effective kill -9 on the VM within the VMKernel.  Note:  be sure to run this command as root (or use sudo).

/usr/lib/vmware/bin/vmkload_app --kill 9 ####

Once this command is complete, the virtual machine still shows as though its in a Powered On state.  To get the VIC to figure this out, I had to restart some daemons on ESX to force VI to figure it all out.  Please note, I disabled VMWare HA and DRS on my cluster because of some of the issues I was having, I am not sure what HA will do with the VM's on this ESX server if you run this command while they HA is enabled.

service vmware-vpxa restart
service mgmt-vmware restart

The virtual machines running on this ESX Server and the ESX Server itself will grey out in your VIC while the services restart.  When everything is back to normal, the VM in question will now be grayed out, with (invalid) appended to the name.

Next, I removed the virtual machine from the inventory, then browsed to it in the datastore, right clicked on the vmx file, and added it back into my inventory.  It still is not ready to boot because it has a couple of .lck directories (lock files).  I browsed in the service console to the virtual machine, went into its directory and ran the following command to blow away all of the locks

rm -rf .lck*

After this was done, I was able to boot the VM back up!  Unfortunately, good ole Windows had some issues on 3 of my 60+ virtual machines.  These virtual machines boot, but promptly lock up.  I am not sure why this happened to a small subset of VM's, but I am attributing this to corruption of the disk.  The OS lost access to the disk and we killed the virtual machine without flushing the writes, so that could have been the problem.  Luckily, we take snapshots every night of the volume that holds the VM's at midnight.  I simply copied the entire VM directory from this backup, blew away the lock files again, added it to inventory, and bam, instant restore from backup.

Everything is back up and running.  We have a few infrastructure changes to make to help our recovery from a down state much quicker, we have a new reason to push for the scripts to bring everything down gracefully, and I have a procedure for unlocking a virtual machine.  We have also (again), verified that our backups are working like a champ!

Friday, March 12, 2010

Novell IDM XPATH

 This one is a pretty fun example.  I have a user coming from a payroll system.  The user has a PayrollCode identifier on them.  Unfortunately, this Payroll identifier code is not completely unique, so I have to query another object in eDirectory to get the uniqueCode.   To do this I will be using XPATH.  I have posted the actions XML of the rule below.  I'll break it apart and explain.

        <actions>
            <do-set-local-variable name="lv.PRCode" scope="policy">
                <arg-string>
                    <token-attr name="PRCode"/>
                </arg-string>
            </do-set-local-variable>
            <do-set-local-variable name="facnode" scope="policy">
                <arg-node-set>
                    <token-xpath expression='query:search($destQueryProcessor,"subordinate","","dn\of\subtree\I\want\to\search","ObjectClassName","PRCode",$lv.PRCode,"ReturnAttr1,uniqueCode,ReturnAttr3")'/>
                </arg-node-set>
            </do-set-local-variable>
            <do-for-each>
                <arg-node-set>
                    <token-local-variable name="facnode"/>
                </arg-node-set>
                <arg-actions>
                    <do-trace-message level="1">
                        <arg-string>
                            <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr1']/value"/>
                        </arg-string>
                    </do-trace-message>
                    <do-set-local-variable name="lv.ReturnAttr1" scope="policy">
                        <arg-string>
                            <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr1']/value"/>
                        </arg-string>
                    </do-set-local-variable>
                    <do-if>
                        <arg-conditions>
                            <and>
                                <if-global-variable mode="nocase" name="GCVAttr1" op="equal">$lv.ReturnAttr1$</if-global-variable>
                            </and>
                        </arg-conditions>
                        <arg-actions>
                            <do-set-dest-attr-value class-name="User" name="uniqueCode">
                                <arg-value>
                                    <token-xpath expression="$current-node[1]/attr[@attr-name='uniqueCode']/value"/>
                                </arg-value>
                            </do-set-dest-attr-value>
                            <do-set-dest-attr-value class-name="User" name="Attr3">
                                <arg-value>
                                    <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr3']/value"/>
                                </arg-value>
                            </do-set-dest-attr-value>
                        </arg-actions>
                        <arg-actions/>
                    </do-if>
                </arg-actions>
            </do-for-each>
            <do-if>
                <arg-conditions>
                    <or>
                        <if-op-attr name="uniqueCode" op="not-available"/>
                        <if-op-attr name="ReturnAttr3" op="not-available"/>
                        <if-op-attr mode="nocase" name="uniqueCode" op="equal"/>
                        <if-op-attr mode="nocase" name="ReturnAttr3" op="equal"/>
                    </or>
                </arg-conditions>
                <arg-actions>
                    <do-trace-message level="1">
                        <arg-string>
                            <token-text xml:space="preserve">No matching facility object found, uniqueCode and Attr3 not set.  Veto'ing transaction.</token-text>
                        </arg-string>
                    </do-trace-message>
                    <do-veto/>
                </arg-actions>
                <arg-actions/>
            </do-if>
        </actions>


Ok, now for the breakdown.  The first section actually executes the meat of  our sample, its the XPATH portion.  first I set a local variable so I don't have to query back on my JDBC driver if the attribute is not readily available.  I can just grab it and store it once in our policy.  Then, I run the XPATH query and set the nodeset to another local variable.

           <do-set-local-variable name="lv.PRCode" scope="policy">
                <arg-string>
                    <token-attr name="PRCode"/>
                </arg-string>
            </do-set-local-variable>
            <do-set-local-variable name="facnode" scope="policy">
                <arg-node-set>
                    <token-xpath expression='query:search($destQueryProcessor,"subordinate","","dn\of\subtree\I\want\to\search","ObjectClassName","PRCode",$lv.PRCode,"ReturnAttr1,uniqueCode,ReturnAttr3")'/>
                </arg-node-set>
            </do-set-local-variable>

The query uses the destQueryProcessor.  We put in the DN of the subtree we want to search (so the query doesn't take forever).  We are looking specifically at objects of class "ObjectClassName".  We are matching the PRCode attribute with the value in the lv.PRCode local variable.  Finally, for each resulting object we find, we want to grab ReturnAttr1, uniqueCode, and ReturnAttr3 attributes from it.

The next thing we are going to do is loop through all of our resulting nodes.

            <do-for-each>
                <arg-node-set>
                    <token-local-variable name="facnode"/>
                </arg-node-set>
                <arg-actions>
                    <do-trace-message level="1">
                        <arg-string>
                            <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr1']/value"/>
                        </arg-string>
                    </do-trace-message>
                    <do-set-local-variable name="lv.ReturnAttr1" scope="policy">
                        <arg-string>
                            <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr1']/value"/>
                        </arg-string>
                    </do-set-local-variable>

We grabbed the local variable facnode that is holding our resulting set.  Foreach will loop through each result.  I put some debug code in there to echo out in the trace file the result of each loop, its not necessary but nice to help step through the code in the trace.  In the result, we grab the ReturnAttr1 value and set it to a local variable lv.ReturnAttr1.  The next thing we are going to do is verify if lv.ReturnAttr1 meets our other criteria of matching a GCV.

                   <do-if>
                        <arg-conditions>
                            <and>
                                <if-global-variable mode="nocase" name="GCVAttr1" op="equal">$lv.ReturnAttr1$</if-global-variable>
                            </and>
                        </arg-conditions>

Pretty straight forward.  If there is a match, we execute the following section of code.

                      <arg-actions>
                            <do-set-dest-attr-value class-name="User" name="uniqueCode">
                                <arg-value>
                                    <token-xpath expression="$current-node[1]/attr[@attr-name='uniqueCode']/value"/>
                                </arg-value>
                            </do-set-dest-attr-value>
                            <do-set-dest-attr-value class-name="User" name="Attr3">
                                <arg-value>
                                    <token-xpath expression="$current-node[1]/attr[@attr-name='ReturnAttr3']/value"/>
                                </arg-value>
                            </do-set-dest-attr-value>
                        </arg-actions>
                        <arg-actions/>
                    </do-if>
                </arg-actions>
            </do-for-each>

If there is a match, I grab the values of the other two attributes (uniqueCode and ReturnAttr3) and stuff them in attributes on the Current User object I am processing.  If not, it will continue looping through the objects.  Once the loop is finished, I want to verify that I found a result and kick back a trace message and veto if I did not find a match.

           <do-if>
                <arg-conditions>
                    <or>
                        <if-op-attr name="uniqueCode" op="not-available"/>
                        <if-op-attr name="ReturnAttr3" op="not-available"/>
                        <if-op-attr mode="nocase" name="uniqueCode" op="equal"/>
                        <if-op-attr mode="nocase" name="ReturnAttr3" op="equal"/>
                    </or>
                </arg-conditions>
                <arg-actions>
                    <do-trace-message level="1">
                        <arg-string>
                            <token-text xml:space="preserve">No matching facility object found, uniqueCode and Attr3 not set.  Veto'ing transaction.</token-text>
                        </arg-string>
                    </do-trace-message>
                    <do-veto/>
                </arg-actions>
                <arg-actions/>
            </do-if>
        </actions>

Thats all there is to it!  The XPATH was easily used to run off and grab stuff out of eDirectory that was not previously available to me.  I can pick it up and use other dirxml logic to process through what I have very easily.

Thursday, March 11, 2010

Novell IDM XSL - Change Attribute to Proper Case

Today I was working on a Novell IDM project and I needed to use some XSL to call an external Java to format some text.  So, when I created my policy, I created an XSLT policy instead of a standard DirXML policy.  My input values looked something like the following:

        <modify-attr attr-name="Given Name">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Given Name">
            <value>
                ROBERT
            </value>
        </modify-attr>
        <modify-attr attr-name="Surname">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Surname">
            <value>
                IVEY
            </value>
        </modify-attr>
        <modify-attr attr-name="Full Name">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Full Name">
            <value>
                ROBERT IVEY
            </value>
        </modify-attr>


 So, here is what the meat of my XSL policy looked like to transform it.  First, we find a match on the tags we want and store the attribute name and its value in some local variables:

    <xsl:template match="add-attr[@attr-name='Given Name']     |add-attr[@attr-name='Surname']">
        <xsl:variable name="attrName" select="./@attr-name"/>
        <xsl:variable name="newVal" select="./value/text()"/>

The next thing we do is ensure that there is a value.  This is done because on modify events there are two tags, the <remove-all-values/> one and the one with the new value to add.  We don't want to send a blank value and output two different tags.

         <xsl:choose>
        <xsl:when test="$newVal">

Now that we have matched the tag and ensured there is a value, lets go ahead and write the new version of the xml element and call the template to replace the value.  We can usee and otherwise statement to copy everything that isn't being replace (IE the <remove-all-values/> tags that we matched but didn't rewrite) and close up all of our xsl tags.

        <add-attr attr-name="{$attrName}">
            <value>
                <xsl:call-template name="convertCase">
                    <xsl:with-param name="UCData" select="$newVal"/>
                </xsl:call-template>
            </value>
        </add-attr>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy-of select="."/>
        </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

Lets have a look at the xsl template "convertCase" thats being called.  Its very simple and calls some java methods that are included using a jar file that we added to our IDM server.

    <xsl:template name="convertCase">
        <xsl:param name="UCData"/>
        <xsl:variable name="LCData" select="util:lowerString($UCData)"/>
        <xsl:variable name="newData" select="util:capitalizeWords($LCData)"/>
        <xsl:value-of select="$newData"/>
    </xsl:template>

The parameter UCData is passed to the method util:lowerString and stored in LCData.  Then, LCData is passed over to util:capitalizeWords and the new value is stored in newData.  Notice how newData is the selected value that we use in the replacement up in the xsl template for the modify.

This template is added to copy through everything we didn't match:

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

And our output should be exactly the same, except our snippet was modified to look like this:


        <modify-attr attr-name="Given Name">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Given Name">
            <value>
                Robert
            </value>
        </modify-attr>
        <modify-attr attr-name="Surname">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Surname">
            <value>
                Ivey
            </value>
        </modify-attr>
        <modify-attr attr-name="Full Name">
             <remove-all-values/>
        </modify-attr>
        <modify-attr attr-name="Full Name">
            <value>
                Robert Ivey
            </value>
        </modify-attr>

Please note, the java classes are custom code delivered by a consultant.  I do not have the source code, nor can I distribute this code without their permission.  A little time with a string tokenizer should help recreate this functionality, but I am by no means a java programmer.  Here is what all of our code looks like when we slap it together:

    <xsl:template match="add-attr[@attr-name='Given Name']     |add-attr[@attr-name='Surname']">
        <xsl:variable name="attrName" select="./@attr-name"/>
        <xsl:variable name="newVal" select="./value/text()"/>
         <xsl:choose>
        <xsl:when test="$newVal">
        <add-attr attr-name="{$attrName}">
            <value>
                <xsl:call-template name="convertCase">
                    <xsl:with-param name="UCData" select="$newVal"/>
                </xsl:call-template>
            </value>
        </add-attr>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy-of select="."/>
        </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    <xsl:template name="convertCase">        <xsl:param name="UCData"/>
        <xsl:variable name="LCData" select="util:lowerString($UCData)"/>
        <xsl:variable name="newData" select="util:capitalizeWords($LCData)"/>
        <xsl:value-of select="$newData"/>
    </xsl:template>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>