Friday, January 16, 2009

Replace failed disk in Veritas

To replace a failed disk in Veritas, please follow the below procedure.

1. Check the failed disk using the command 'vxdisk list'

2. Run the 'format' command to see ' if the disk is offline' or 'not responding to selection'.

3. Log a service call to hardware vendor.

4. Remove the failed disk from volume manager control using the below commands.

a. Run 'vxdiskadm' as root.

b. choose option 4: Remove a disk for replacement

c. Choose the logical name corresponding the disk that has failed ( for ex. data02)

5. Get the disk replaced by the vendor.

6. Make sure the disk appears fine in the format command(no need to do any partition).

7. Run 'vxdctl enable' to enable vxconfigd sense the replaced device

8. Run 'vxdiskadm' command again and follow the below steps.

a. Choose option:5 Replace a failed or removed disk.

b. Choose the disk that was removed in step 4b(for ex. data02).

c. Choose the device corresponding to the logical name(for ex. c1t10d0)

d. Say no to 'encapsulate' and choose okay to initialise the disk to replace the failed one.

e. Accept default (no - option) for FMR plex resync option

f. Once completed successful appeared on the prompt.Exit vxdiskadm

9. Check the disks are online by running 'vxdisk list'.


vxprint -ht

Moving hot-relocated subdisk back to disk

# vxdiskadm

Choose option 14


Move hot-relocated subdisks back to a disk
Menu: VolumeManager/Disk/UnrelocateDisk
Use this operation to move subdisks which were hot-relocated back
onto the original disk that has been replaced due to a disk failure.
This operation takes, as input, the original disk name. If the
failed drive was replaced with a disk using a different name, this
operation also provides an option to specify the new name.
Enter the original disk name [,list,q,?] list
datadg0211
datadg03

Enter the original disk name [,list,q,?] datadg0211
Unrelocate to a new disk [y,n,q,?] (default: n)
Requested operation is to move all the subdisks which were hot-relocated
from datadg0211 back to datadg0211 of disk group datadg02.
Continue with operation? [y,n,q,?] (default: y)
Use -f option to unrelocate the subdisks if moving to the exact offset fails?
[y,n,q,?] (default: n)

Thursday, January 15, 2009

Go to ok prompt from ILOM of Sun T5120

Follow below procedure to get to "ok" prompt from ILOM.

1. ssh to ILOM hostname


2. From the ILOM prompt , type the below.

--> set /HOST send_break_action=break

--> start /SP/console to get to the ok prompt.


Manual system reset from the ILOM prompt.

--> set /HOST/bootmode script="setenv auto-boot? false"

--> reset /SYS

SUN M4000 ALOM

SUN's M4000 server has a new management interface called XSCF. It's different from the usual "sc" of some low-end servers.


Logon to ALOM of Sun M4000:

# ssh or
username: eis-installer or your_username
password: password


To connect to console:

XSCF> console -d 0

If somebody is already using the console, you can force connect

XSCF> console -d 0 -f

To go back to XSCF prompt:

type "#." (without the quotes)

To reset the server/domain:

XSCF> reset -d 0 por [resets domain 0]
XSCF> reset -d 0 xir [resets domain 0 with XIR reset]

To send break:

XSCF> sendbreak -d 0

To reboot XSCF system:

XSCF> rebootxscf


Other commands below:


XSCF> showstatus
XSCF> showversion -c xcp -v [shows xcp firmware, version, openboot prom version
XSCF> showenvironment
XSCF> showenvironment temp
XSCF> showenvironment volt
XSCF> showhardconf
XSCF> showdcl -va [check domain id...]
XSCF> showdomainstatus -a
XSCF> showboards -a
XSCF> poweron -a [powers up all domains]
XSCF> poweroff -a [powers off all domains]
XSCF> poweron -d 0 [powers on domain 0]
XSCF> poweroff -d 0 [powers off domain 0]
XSCF> poweroff -f -d 0 [forces a power off domain 0]
XSCF> sendbreak -d 0 [sends break command to domain 0]
XSCF> setautologout -s 60 [sets autologout to 60 minutes]
XSCF> showautologout
XSCF> shownetwork -a
XSCF> setnetwork xscf#0-lan#0 -m 255.255.255.0 10.10.10.5
XSCF> sethostname xscf#0 fire-xscf
XSCF> sethostname -h host.org
XSCF> setroute -h host.org
XSCF> setnameserver 10.10.10.2 10.10.10.3
XSCF> setroute -c add -n 10.10.10.1 -m 255.255.255.0 xscf#0-lan#0

To add 2 additional memory boards:

XSCF> addboard -c assign -d 0 00-2
XSCF> addboard -c assign -d 1 00-3

XSCF> showboards -va

Veritas Netbackup Client Installation

Netbackup client needs to be installed on the server that you need to create backup.

For Linux

1. Check if netbackup client exist

# rpm -qa | grep nbu

2. If none yet, install the netbackup client rpm package

# rpm -ivh SYSnbuc-6.0-4.i386.rpm (or the name of the package)

3. Edit bp.conf

# cd /usr/openv/netbackup
# vi bp.conf (remove everything and put your backup server's hostname)

SERVER = hostname

4. Create exclude_list file. This will exclude the specified file or dir from the backup. See example below.
[root@hostname] cat exclude_list
#Sparse file that can take 45 mins to process.
/var/log/lastlog
#2.6 kernel introduces the /sys tree which cannot be backed up
/sys

5. Check route of client to netbackup server

# netstat -rn (check routing table of the client)
If the route from the client to to backup server doesn't exist, create the static route, example below.

route add -net 172.24.16.0 netmask 255.255.252.0 gw 172.24.27.1

Add this to your /etc/rc.local/ to add route during boot up.

# ssh backup_server (connect to backup server via ssh)
# ping backup_client


For Sun Solaris 8/10

1 Check if there's netbackup client installed - check if there's /opt/openv or /usr/openv
2. if none exist, install the package
# pkgadd -d . netbackcl

3. Edit bp.conf
# cd /usr/openv/netbackup/
# vi bp.conf (remove everything and put below entry)
--

SERVER = hostname
---

4. create exclude_list file

For Solaris 8:

/proc/
/tmp/
/cdrom/
core
/home/
/apps/
/builds/
/patches/
/packages/

For Solaris 10:

#Start of OS standard excludes
/*arch*/
/*ora*dump*/
/ldoms/*
/oracrs*/*
/oravote*/*
/tmp/*
/var/SUNWsrspx/SRSQueueStore/store/.free
/var/SUNWsrspx/SUNWsrspx/SRSQueueStore/store/.free
/var/crash/*
/var/opt/SUNWsrspx/SRSQueueStore/store/.free
/var/tmp/*
#End of OS standard excludes
#Host specific excludes below

Building Sun Fire X4150 x86 machine

Building X4150 machine is a little bit different from Sun Fire V-series machines. To build this machine, you need to follow below steps:

1. Configure the ILOM (Integrated Lights-Out-Management) service to gain remote access of the server. Sun engineer can help in setting this one up. You may also ask the engineer to enable the web gui, usually https://ILOM_hostname or https://ILOM_ipaddress

2. Connect to the server via ssh or telnet (depends on what has been enabled)

# ssh ILOM hostname / IP address

or via web GUI, https://ILOM_hostname

3. Launch server console by clicking "Launch Redirection" button under "Remote Control" tab

4. Reset the machine to configure BIOS and RAID. Click "Remote Power Control" tab and then select "Reset" in Power Control field and then click "Save".

5. In BIOS configuration menu, go to Server to check the NIC's mac addresses (you'll be needing this on jumpstart process)

6. Go to Boot and the select Boot Device Priority. Select "USB:Virtual DVD/CD" as the "1st Boot Device" and the "RAID disk" as the "2nd Boot Device". Save and Exit BIOS.

7. Configure Jumpstart to create an initial boot ISO.

8. Mount the iso image. From the ILOM remote console, Select "Devices" and then click on "CD-ROM image" and then select the iso file.

9. Proceed with the Jumpstart.

Note: Different errors might occur during jummpstart,like Disk not found error. For this one you need to check if RAID has been configured. For ....... error (hehehe) you may want to check your tftp package or check your network connection.

AutoSys Admin Commands

A collection of autosys admin commands that I use to manage my company's Autosys Infrastructure.

First setup aliases to make your life easier!

Below is how you setup aliases in C shell. If you're using a different shell, then, RTFM!

Add the aliases in your profile

# Send Event
alias se sendevent -E

# Start Job
alias fsj sendevent -E FORCE_STARTJOB -J
alias sj sendevent -E STARTJOB -J

# Job Report
alias jr autorep -J

# Machine Report
alias mr autorep -M

Then, just do the ff commands:

To view job details,
# jr jobname -q

To view job full name and box job
# jr jobname -w

To view job per page
# jr nbu.tok.prd% | pg

To force start a job
# fsj jobname

To ICE a job
# sendevent -E JOB_ON_ICE -J jobname

To un-ICE a job
# sendevent -E JOB_OFF_ICE -J jobname

To kill a job
# sendevent -E KILLJOB -J jobname

To mark job as success
# sendevent -E CHANGE_STATUS -s SUCCESS -j

To mark job as terminated
# sendevent -E CHANGE_STATUS -s TERMINATED -j

To check the job details
# jr jobname -d

To delete a job
# cat job.jil
delete_job: jobname
# jil < job.jil

To update a job
# cat job.jil
update_job: jobname
description: "New Description"
# jil < job.jil

To setup a backup job
# jr existing_job -q > newjob.jil (copy existing job)
# vi newjob.jil (edit entries appropriately)
# jil < newjob.jil (load the job)
# sendevent -E JOB_ON_ICE -J jobname (ice the job)
# fsj jobname (force start the job)

To rename a job
# jr old_job_name -q new_job_name.jil
# vi new_job_name.jil (rename old name with the new one)
# save the file
# jil < new_job_name.jil
# delete old_job_name, by doing
# jil
delete_job: old_job_name (enter)
press ctrl-d

That's it!!!