개요

정상적으로 운영중인 11.2.0.4 grid 환경에서 crsd.bin을 kill하면 바로 다시 시작되는데

이럴 상황은 거의 없겠지만 crsd를 다시 시작되지 않을 count만큼 kill 하고 어떻게 되는지 봅니다.

가끔 crsd만 죽어있는 상황에 적용할 수 있겠습니다.


 

kill 진행 - crsd.bin을 너무 빨리 kill하면 crsd startup hang이 걸리니 2초 간격으로 진행

[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep
root     16554     1  2 11:56 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
[root@node01 ~]# ps -ef | grep crsd.bin | grep -v grep|  awk '{print $2}' | xargs kill -9
usage: kill [ -s signal | -p ] [ -a ] pid ...
       kill -l [ signal ]                                       --프로세스가 없어서 fail


=> 11번 죽였더니 안살아납니다. crsd의 maximun restart attempts은 확인을 못하겠네요. 테스트로 11번이 max 라는건 알게 되었습니다. 확인하는 다른 방법 아시는분은 알려주시면 감사하겠습니다.


alertcrs로그의 상태 중 마지막 영역

.....생략... 

2016-10-12 11:58:09.533:
[crsd(17859)]CRS-1201:CRSD started on node node01.
2016-10-12 11:58:10.123:
[ohasd(15872)]CRS-2765:Resource 'ora.crsd' has failed on server 'node01'.
2016-10-12 11:58:10.123:
[ohasd(15872)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

 

리소스 상태 체크

[root@node01 ~]# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[root@node01 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       node01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node01
ora.crf
      1        ONLINE  ONLINE       node01
ora.crsd
      1        ONLINE  OFFLINE

ora.cssd
      1        ONLINE  ONLINE       node01
ora.cssdmonitor
      1        ONLINE  ONLINE       node01
ora.ctssd
      1        ONLINE  ONLINE       node01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       node01
ora.evmd
      1        ONLINE  ONLINE       node01
ora.gipcd
      1        ONLINE  ONLINE       node01
ora.gpnpd
      1        ONLINE  ONLINE       node01
ora.mdnsd
      1        ONLINE  ONLINE       node01

 

 


 

정상화 시키기 위해 stop 및 start 시도

[root@node01 ~]# crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
-- crsd가 실행중이 아니라고 합니다

 

[root@node01 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

-- 이번엔 OHAS는 이미 active라며 안됩니다.

 

crsd만 시작

[root@node01 ~]# crsctl start resource ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'node01'
CRS-2676: Start of 'ora.crsd' on 'node01' succeeded      

 

확인

[root@node01 ~]# crsctl stat res -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       node01                   Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node01
ora.crf
      1        ONLINE  ONLINE       node01
ora.crsd
      1        ONLINE  ONLINE       node01

ora.cssd
      1        ONLINE  ONLINE       node01
ora.cssdmonitor
      1        ONLINE  ONLINE       node01
ora.ctssd
      1        ONLINE  ONLINE       node01                   ACTIVE:0
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       node01
ora.evmd
      1        ONLINE  ONLINE       node01
ora.gipcd
      1        ONLINE  ONLINE       node01
ora.gpnpd
      1        ONLINE  ONLINE       node01
ora.mdnsd
      1        ONLINE  ONLINE       node01
Posted by neo-orcl
,

[grid@node01 grid]$ ./runcluvfy.sh stage -post crsinst -n node01,node02 -verbose

Performing post-checks for cluster services setup

Checking node reachability...

Check: Node reachability from node "node01"
  Destination Node                      Reachable?
  ------------------------------------  ------------------------
  node02                                yes
  node01                                yes
Result: Node reachability check passed from node "node01"


Checking user equivalence...

Check: User equivalence for user "grid"
  Node Name                             Status
  ------------------------------------  ------------------------
  node02                                passed
  node01                                passed
Result: User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...
  Node Name                             Status
  ------------------------------------  ------------------------
  node02                                passed
  node01                                passed

Verification of the hosts config file successful


Interface information for node "node02"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   192.168.30.22   192.168.30.0    0.0.0.0         192.168.30.2    00:0C:29:8A:2F:0E 1500
 eth0   192.168.30.28   192.168.30.0    0.0.0.0         192.168.30.2    00:0C:29:8A:2F:0E 1500
 eth0   192.168.30.24   192.168.30.0    0.0.0.0         192.168.30.2    00:0C:29:8A:2F:0E 1500
 eth1   192.168.137.22  192.168.137.0   0.0.0.0         192.168.30.2    00:0C:29:8A:2F:04 1500
 eth1   169.254.41.15   169.254.0.0     0.0.0.0         192.168.30.2    00:0C:29:8A:2F:04 1500


Interface information for node "node01"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth1   192.168.137.21  192.168.137.0   0.0.0.0         192.168.30.2    00:0C:29:07:37:4E 1500
 eth1   169.254.170.129 169.254.0.0     0.0.0.0         192.168.30.2    00:0C:29:07:37:4E 1500
 eth0   192.168.30.21   192.168.30.0    0.0.0.0         192.168.30.2    00:0C:29:07:37:58 1500
 eth0   192.168.30.23   192.168.30.0    0.0.0.0         192.168.30.2    00:0C:29:07:37:58 1500


Check: Node connectivity for interface "eth1"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  node02[192.168.137.22]          node01[192.168.137.21]          yes
Result: Node connectivity passed for interface "eth1"


Check: TCP connectivity of subnet "192.168.137.0"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  node01:192.168.137.21           node02:192.168.137.22           passed
Result: TCP connectivity check passed for subnet "192.168.137.0"


Check: Node connectivity for interface "eth0"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  node02[192.168.30.22]           node02[192.168.30.28]           yes
  node02[192.168.30.22]           node02[192.168.30.24]           yes
  node02[192.168.30.22]           node01[192.168.30.21]           yes
  node02[192.168.30.22]           node01[192.168.30.23]           yes
  node02[192.168.30.28]           node02[192.168.30.24]           yes
  node02[192.168.30.28]           node01[192.168.30.21]           yes
  node02[192.168.30.28]           node01[192.168.30.23]           yes
  node02[192.168.30.24]           node01[192.168.30.21]           yes
  node02[192.168.30.24]           node01[192.168.30.23]           yes
  node01[192.168.30.21]           node01[192.168.30.23]           yes
Result: Node connectivity passed for interface "eth0"


Check: TCP connectivity of subnet "192.168.30.0"
  Source                          Destination                     Connected?
  ------------------------------  ------------------------------  ----------------
  node01:192.168.30.21            node02:192.168.30.22            passed
  node01:192.168.30.21            node02:192.168.30.28            passed
  node01:192.168.30.21            node02:192.168.30.24            passed
  node01:192.168.30.21            node01:192.168.30.23            passed
Result: TCP connectivity check passed for subnet "192.168.30.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.30.0".
Subnet mask consistency check passed for subnet "192.168.137.0".
Subnet mask consistency check passed.

Result: Node connectivity check passed

Checking multicast communication...

Checking subnet "192.168.30.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.30.0" for multicast communication with multicast group "230.0.1.0" passed.

Checking subnet "192.168.137.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.137.0" for multicast communication with multicast group "230.0.1.0" passed.

Check of multicast communication passed.
Check: Time zone consistency
Result: Time zone consistency check passed

Checking Oracle Cluster Voting Disk configuration...

ASM Running check passed. ASM is running on all specified nodes

Oracle Cluster Voting Disk configuration check passed

Checking Cluster manager integrity...


Checking CSS daemon...

  Node Name                             Status
  ------------------------------------  ------------------------
  node02                                running
  node01                                running

Oracle Cluster Synchronization Services appear to be online.

Cluster manager integrity check passed


UDev attributes check for OCR locations started...
Result: UDev attributes check passed for OCR locations


UDev attributes check for Voting Disk locations started...
Result: UDev attributes check passed for Voting Disk locations


Check default user file creation mask
  Node Name     Available                 Required                  Comment
  ------------  ------------------------  ------------------------  ----------
  node02        0022                      0022                      passed
  node01        0022                      0022                      passed
Result: Default user file creation mask check passed

Checking cluster integrity...

  Node Name
  ------------------------------------
  node01
  node02

Cluster integrity check passed


Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations


ASM Running check passed. ASM is running on all specified nodes

Checking OCR config file "/etc/oracle/ocr.loc"...

OCR config file "/etc/oracle/ocr.loc" check successful


Disk group for ocr location "+OCR" available on all the nodes


NOTE:
This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR.

OCR integrity check passed

Checking CRS integrity...

Clusterware version consistency passed
The Oracle Clusterware is healthy on node "node02"
The Oracle Clusterware is healthy on node "node01"

CRS integrity check passed

Checking node application existence...

Checking existence of VIP node application (required)
  Node Name     Required                  Running?                  Comment
  ------------  ------------------------  ------------------------  ----------
  node02        yes                       yes                       passed
  node01        yes                       yes                       passed
VIP node application check passed

Checking existence of NETWORK node application (required)
  Node Name     Required                  Running?                  Comment
  ------------  ------------------------  ------------------------  ----------
  node02        yes                       yes                       passed
  node01        yes                       yes                       passed
NETWORK node application check passed

Checking existence of GSD node application (optional)
  Node Name     Required                  Running?                  Comment
  ------------  ------------------------  ------------------------  ----------
  node02        no                        no                        exists
  node01        no                        no                        exists
GSD node application is offline on nodes "node02,node01"

Checking existence of ONS node application (optional)
  Node Name     Required                  Running?                  Comment
  ------------  ------------------------  ------------------------  ----------
  node02        no                        yes                       passed
  node01        no                        yes                       passed
ONS node application check passed


Checking Single Client Access Name (SCAN)...
  SCAN Name         Node          Running?      ListenerName  Port          Running?
  ----------------  ------------  ------------  ------------  ------------  ------------
  node-cluster-scan  node02        true          LISTENER_SCAN1  1521          true

Checking TCP connectivity to SCAN Listeners...
  Node          ListenerName              TCP connectivity?
  ------------  ------------------------  ------------------------
  node01        LISTENER_SCAN1            yes
TCP connectivity to SCAN Listeners exists on all cluster nodes

Checking name resolution setup for "node-cluster-scan"...

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes...
Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined
More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file
All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf"
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed


ERROR:
PRVG-1101 : SCAN name "node-cluster-scan" failed to resolve
  SCAN Name     IP Address                Status                    Comment
  ------------  ------------------------  ------------------------  ----------
  node-cluster-scan  192.168.30.28             failed                    NIS Entry

ERROR:
PRVF-4657 : Name resolution setup check for "node-cluster-scan" (IP address: 192.168.30.28) failed

ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "node-cluster-scan"

Verification of SCAN VIP and Listener setup failed

Checking OLR integrity...

Checking OLR config file...

OLR config file check successful


Checking OLR file attributes...

OLR file check successful


WARNING:
This check does not verify the integrity of the OLR contents. Execute 'ocrcheck -local' as a privileged user to verify the contents of OLR.

OLR integrity check passed
OCR detected on ASM. Running ACFS Integrity checks...

Starting check to see if ASM is running on all cluster nodes...

ASM Running check passed. ASM is running on all specified nodes

Starting Disk Groups check to see if at least one Disk Group configured...
Disk Group Check passed. At least one Disk Group configured

Task ACFS Integrity check passed

Checking to make sure user "grid" is not in "root" group
  Node Name     Status                    Comment
  ------------  ------------------------  ------------------------
  node02        passed                    does not exist
  node01        passed                    does not exist
Result: User "grid" is not part of "root" group. Check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status
  ------------------------------------  ------------------------
  node02                                passed
  node01                                passed
Result: CTSS resource check passed


Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed

Check CTSS state started...
Check: CTSS state
  Node Name                             State
  ------------------------------------  ------------------------
  node02                                Active
  node01                                Active
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Reference Time Offset Limit: 1000.0 msecs
Check: Reference Time Offset
  Node Name     Time Offset               Status
  ------------  ------------------------  ------------------------
  node02        0.0                       passed
  node01        0.0                       passed

Time offset is within the specified limits on the following set of nodes:
"[node02, node01]"
Result: Check of clock time offsets passed


Oracle Cluster Time Synchronization Services check passed
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Post-check for cluster services setup was unsuccessful on all the nodes.

Posted by neo-orcl
,

data DG에서 ocrvote DG로 OCR과 votedisk를 옮기는 작업로그이다.

ocrvote DG를 normal redundancy로 3ea 디스크를 1g씩 넣어서 만들었다.

 

[root@mid1 disk]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 316c6e2c446c4fb8bf67fcf82a17593d (/dev/sdb) [DATA]
Located 1 voting disk(s).
[root@mid1 disk]# crsctl replace votedisk +ocrvote
Successful addition of voting disk 13ff72d596864f24bf2244181516f700.
Successful addition of voting disk 6a48a54241b84fd4bf6b28dc717db393.
Successful addition of voting disk 5ae56b9f30e14ff6bf96110578b191b4.
Successful deletion of voting disk 316c6e2c446c4fb8bf67fcf82a17593d.
Successfully replaced voting disk group with +ocrvote.
CRS-4266: Voting file(s) successfully replaced
[root@mid1 disk]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 13ff72d596864f24bf2244181516f700 (/dev/sde) [OCRVOTE]
2. ONLINE 6a48a54241b84fd4bf6b28dc717db393 (/dev/sdf) [OCRVOTE]
3. ONLINE 5ae56b9f30e14ff6bf96110578b191b4 (/dev/sdg) [OCRVOTE]
Located 3 voting disk(s).

[root@mid1 disk]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2876
Available space (kbytes) : 259244
ID : 1140307615
Device/File Name : +DATA
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[root@mid1 disk]# ocrconfig -add +ocrvote
[root@mid1 disk]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2876
Available space (kbytes) : 259244
ID : 1140307615
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File Name : +ocrvote
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[root@mid1 disk]# ocrconfig -delete +data
[root@mid1 disk]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2876
Available space (kbytes) : 259244
ID : 1140307615
Device/File Name : +ocrvote
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

Posted by neo-orcl
,

CRS는 10.2.0.5

DB는 10.2.0.5

이 상황에서 10.2.0.3으로 내려달라는 요청이 있었다.

DB 10.2.0.5인 상황에서 DBCA로 DB 생성했기에 compatible 값은 10.2.0.5 이다. 다운그레이드는 불가.

결국 10.2.0.3 DB를 추가로 설치하기로 하고 거기에다가 datapump를 써서 넘기라고 했다.

그런데 리스너는 현재 10.2.0.5의 엔진에서 돌아가고 있는 상황 아닌가?

그래서 10.2.0.5에서 netca로 리스너 제거하고 10.2.0.3에서 netca해봤더니 같은 이름을 가진 리스너가 있다며 진행 불가.

음. 혹시나 해서 CRS를 재시작해보고 진행해보니 잘 된다. 뭔가 OCR에 등록된게 꼬인 것 같다.

정리해보면

 

1. 10.2.0.5에서 리스너를 netca로 제거

2. 10.2.0.3에서 리스너를 netca로 생성. 잘 된다면 끝

3. 만약 같은 이름의 리스너가 있다면 전 노드에서 crsctl stop crs, crsctl start crs 후 다시 진행

 

※11gR2는 리스너 자체가 GI HOME에서 실행되기에 위 같은 문제는 발생하지 않는다.

'Knowledge > RAC' 카테고리의 다른 글

11.2.0.4 crsd.bin kill test  (0) 2016.10.19
runcluvfy post 실행결과  (0) 2015.12.16
Replace OCR, votedisk to ASM Disk Group  (0) 2014.02.24
RAC 환경에서 DB 수동으로 Drop  (0) 2013.12.18
RAC Failover Test 10.2.0.2 64bit on linux 4.5  (0) 2013.06.28
Posted by neo-orcl
,

RAC환경에서 가끔 dbca로 drop을 했는데 Drop 성공이라고 나타나나 정상으로 drop되지 않는 경우가 있다.

이를 위해 수동으로 RAC 환경에서 DB를 삭제하는 방법을 남긴다.

 

1. DB중지
spfile위치가 공용이라면 아래 작업도 필요(편하게 하기 위해서)
SQL> create pfile from spfile;
SQL> create spfile from pfile;

 

srvctl stop database -d <dbname>

 

2. OCR에서 제거
srvctl remove instance -d RACDB -i RAC1

srvctl remove instance -d RACDB -i RAC2

srvctl remove database -d RACDB

 

3. cluster_database 셋팅 변경

sqlplus / as sysdba

SQL> startup mount restrict exclusive;

SQL> alter system set cluster_database=false scope=spfile;

 

4. restrict로 다시 시작

sqlplus / as sysdba

SQL> shut immediate

SQL> startup mount restrict exclusive;

 

5. Drop database

SQL> drop database;

 

6. 기타 password 파일, init, spfile, 남은 temp파일 등 삭제

Posted by neo-orcl
,

################ 환경 #############################

OS: Oracle Enterprise Linux 4.5 64bit

DBMS: Oracle 10.2.0.2 Enterprise Edition 64bit

CRS: 10.2.0.2 64bit

Opatch: 없음

VM: Virtual Box

Node: 2ea

Shared Storage: ocfs2

Time: NTP 사용

service: web idb1 선호, idb2 available

         intra idb1 vailable, idb2 선호

CRS 및 서비스 설정: 기본값

 

################public line 절체######################

 

평상시. 현재 마스터 노드는 idb1

 

[oracle@idb1 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

idb1 public line 절체시 결과

 

[oracle@idb1 cssd]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     OFFLINE          

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     OFFLINE          

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb2   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

idb1 public line을 다시 살려도 변화 없음. web 서비스가 옮겨지지 않음. web 서비스 불가. vip는 넘어감

 

직접 올림

[oracle@idb2 ~]$ srvctl start instance -d idb -i idb1

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

리스너도 자동으로 올라왔고 vip도 자동으로 idb1으로 돌아갔으며 web 서비스도 다시 idb1으로 이동됨

 

##################### interconnect 절체########################

현재 idb1이 마스터노드.

 

[oracle@idb1 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

idb1 interconenct 절체

 

idb2 ocssd.log

 

[    CSSD]2013-06-28 14:15:11.422 [213005] >TRACE:   clssnmPollingThread: node idb1 (1) missed(59) checkin(s)

[    CSSD]2013-06-28 14:15:12.424 [213005] >TRACE:   clssnmPollingThread: node idb1 (1) is impending reconfig

[    CSSD]2013-06-28 14:15:12.424 [213005] >TRACE:   clssnmPollingThread: Eviction started for node idb1 (1), flags 0x000f, state 3, wt4c 0

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmDoSyncUpdate: Initiating sync 2

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2013-06-28 14:15:12.424 [245775] >TRACE:   clssnmSendSync: syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.424 [131080] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[idb2] seq[1] sync[2]

[    CSSD]2013-06-28 14:15:12.424 [131080] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmWaitForAcks: node(1) is expiring, msg type(11)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmDoSyncUpdate: node(0) missCount(7677) state(0)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmDoSyncUpdate: node(1) missCount(60) state(3)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmSendVote: syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.425 [131080] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(0)

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[    CSSD]2013-06-28 14:15:12.425 [245775] >TRACE:   clssnmCheckDskInfo: node(1) timeout(570) state_network(0) state_disk(3) missCount(60)

[    CSSD]2013-06-28 14:15:12.489 [16384] >USER:    NMEVENT_SUSPEND [00][00][00][06]

[    CSSD]2013-06-28 14:15:12.859 [81925] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(7679) LATS(6867374) Disk lastSeqNo(7679)

[    CSSD]2013-06-28 14:15:13.427 [245775] >TRACE:   clssnmCheckDskInfo: node(1) disk HB found, network state 0, disk state(3) missCount(61)

[    CSSD]2013-06-28 14:15:13.862 [81925] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(7680) LATS(6868374) Disk lastSeqNo(7680)

[    CSSD]2013-06-28 14:15:14.429 [245775] >TRACE:   clssnmCheckDskInfo: node(1) disk HB found, network state 0, disk state(3) missCount(62)

[    CSSD]2013-06-28 14:15:14.866 [81925] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(7681) LATS(6869374) Disk lastSeqNo(7681)

[    CSSD]2013-06-28 14:15:15.429 [245775] >TRACE:   clssnmCheckDskInfo: node(1) disk HB found, network state 0, disk state(3) missCount(63)

[    CSSD]2013-06-28 14:15:15.868 [81925] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(7682) LATS(6870384) Disk lastSeqNo(7682)

[    CSSD]2013-06-28 14:15:16.431 [245775] >ERROR:   clssnmCheckDskInfo: Terminating local instance to avoid splitbrain.

[    CSSD]2013-06-28 14:15:16.431 [245775] >ERROR:                 : Node(2), Leader(2), Size(1) VS Node(1), Leader(1), Size(1)

[    CSSD]2013-06-28 14:15:16.431 [245775] >TRACE:   clssscctx:  dump of 0x0x5d7750, len 3808

 

idb1 ocssd.log

 

[    CSSD]2013-06-28 14:15:11.307 [213005] >TRACE:   clssnmPollingThread: node idb2 (2) missed(59) checkin(s)

[    CSSD]2013-06-28 14:15:12.309 [213005] >TRACE:   clssnmPollingThread: node idb2 (2) is impending reconfig

[    CSSD]2013-06-28 14:15:12.309 [213005] >TRACE:   clssnmPollingThread: Eviction started for node idb2 (2), flags 0x000d, state 3, wt4c 0

[    CSSD]2013-06-28 14:15:12.309 [245775] >TRACE:   clssnmDoSyncUpdate: Initiating sync 2

[    CSSD]2013-06-28 14:15:12.309 [245775] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:15:12.309 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSendSync: syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.310 [131080] >TRACE:   clssnmHandleSync: Acknowledging sync: src[1] srcName[idb1] seq[5] sync[2]

[    CSSD]2013-06-28 14:15:12.310 [131080] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmWaitForAcks: node(2) is expiring, msg type(11)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmDoSyncUpdate: node(0) missCount(7682) state(0)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmDoSyncUpdate: node(2) missCount(60) state(3)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmSendVote: syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.310 [131080] >TRACE:   clssnmSendVoteInfo: node(1) syncSeqNo(2)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(0)

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[    CSSD]2013-06-28 14:15:12.310 [245775] >TRACE:   clssnmCheckDskInfo: node(2) timeout(390) state_network(0) state_disk(3) missCount(60)

[    CSSD]2013-06-28 14:15:12.320 [16384] >USER:    NMEVENT_SUSPEND [00][00][00][06]

[    CSSD]2013-06-28 14:15:12.922 [81925] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(7672) LATS(6867784) Disk lastSeqNo(7672)

[    CSSD]2013-06-28 14:15:13.312 [245775] >TRACE:   clssnmCheckDskInfo: node(2) disk HB found, network state 0, disk state(3) missCount(61)

[    CSSD]2013-06-28 14:15:13.925 [81925] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(7673) LATS(6868784) Disk lastSeqNo(7673)

[    CSSD]2013-06-28 14:15:14.313 [245775] >TRACE:   clssnmCheckDskInfo: node(2) disk HB found, network state 0, disk state(3) missCount(62)

[    CSSD]2013-06-28 14:15:14.927 [81925] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(7674) LATS(6869784) Disk lastSeqNo(7674)

[    CSSD]2013-06-28 14:15:15.315 [245775] >TRACE:   clssnmCheckDskInfo: node(2) disk HB found, network state 0, disk state(3) missCount(63)

[    CSSD]2013-06-28 14:15:15.930 [81925] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(7675) LATS(6870784) Disk lastSeqNo(7675)

[    CSSD]2013-06-28 14:15:16.318 [245775] >TRACE:   clssnmCheckDskInfo: node(2) missCount(64) state(0). Smaller(1) cluster node 2. mine is 1. (2/1)

[    CSSD]2013-06-28 14:15:16.318 [245775] >TRACE:   clssnmEvict: Start

[    CSSD]2013-06-28 14:15:16.318 [245775] >TRACE:   clssnmEvict: Evicting node 2, birth 1, death 2, killme 1

[    CSSD]2013-06-28 14:15:16.318 [245775] >TRACE:   clssnmSendShutdown: req to node 2, kill time 6871174

[    CSSD]2013-06-28 14:15:16.318 [245775] >TRACE:   clssnmDiscHelper: node idb2 (2) connection failed

[    CSSD]2013-06-28 14:15:16.319 [81925] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(7676) LATS(6871174) Disk lastSeqNo(7676)

[    CSSD]2013-06-28 14:15:16.321 [245775] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2013-06-28 14:15:46.380 [245775] >WARNING: clssnmWaitOnEvictions: Unconfirmed dead node count 1

[    CSSD]2013-06-28 14:15:46.380 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2013-06-28 14:15:46.380 [245775] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE

[    CSSD]2013-06-28 14:15:46.380 [245775] >TRACE:   clssnmSendUpdate: syncSeqNo(2)

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmDeactivateNode: node 0 () left cluster

 

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmUpdateNodeState: node 1, state (3/3) unique (1372388811/1372388811) prevConuni(0) birth (1/1) (old/new)

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmUpdateNodeState: node 2, state (0/0) unique (1372388818/1372388818) prevConuni(1372388818) birth (1/0) (old/new)

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmDeactivateNode: node 2 (idb2) left cluster

 

[    CSSD]2013-06-28 14:15:46.381 [131080] >USER:    clssnmHandleUpdate: SYNC(2) from node(1) completed

[    CSSD]2013-06-28 14:15:46.381 [131080] >USER:    clssnmHandleUpdate: NODE 1 (idb1) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2013-06-28 14:15:46.381 [131080] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2013-06-28 14:15:46.381 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(0)

[    CSSD]2013-06-28 14:15:46.381 [245775] >TRACE:   clssnmDoSyncUpdate: Sync Complete!

[    CSSD]2013-06-28 14:15:46.400 [278544] >TRACE:   clssgmReconfigThread:  started for reconfig (2)

[    CSSD]2013-06-28 14:15:46.400 [278544] >USER:    NMEVENT_RECONFIG [00][00][00][02]

[    CSSD]2013-06-28 14:15:46.401 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2013-06-28 14:15:46.401 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(crs_version) birth(1/1)

[    CSSD]2013-06-28 14:15:46.401 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_idb type 2

[    CSSD]2013-06-28 14:15:46.401 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_idb type 3

[    CSSD]2013-06-28 14:15:46.401 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_idb type 2

[    CSSD]2013-06-28 14:15:46.402 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_idb) birth(1/1)

[    CSSD]2013-06-28 14:15:46.402 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_idb type 3

[    CSSD]2013-06-28 14:15:46.402 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_idb) birth(1/1)

[    CSSD]2013-06-28 14:15:46.402 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock DBIDB type 2

[    CSSD]2013-06-28 14:15:46.403 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DBIDB) birth(1/1)

[    CSSD]2013-06-28 14:15:46.403 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock DGIDB type 2

 

서로 상대방을 evict하려 했지만 master node가 아닌 node evict 되었음(idb2)

 

crsstat 상태

[oracle@idb1 cssd]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     OFFLINE          

ora.idb.intra.cs                              ONLINE     ONLINE on idb1   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb1   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     OFFLINE          

ora.idb2.gsd                                  ONLINE     OFFLINE          

ora.idb2.ons                                  ONLINE     OFFLINE          

ora.idb2.vip                                  ONLINE     ONLINE on idb1   

 

이후 idb2가 리붓되어 올라오지만 interconnect가 절체된 상태이기에 변화 없음

 

만약 ocfs를 사용하고 인터페이스를 interconnect 사용한다면 idb2 disk 마운트도 안됨.

 

[root@idb2 ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda1              17G  7.7G  8.0G  49% /

none                  754M     0  754M   0% /dev/shm

/dev/sda3             373M   92M  262M  27% /var

[root@idb2 ~]# cat /etc/fstab

# This file is edited by fstab-sync - see 'man fstab-sync' for details

LABEL=/                 /                       ext3    defaults        1 1

none                    /dev/pts                devpts  gid=5,mode=620  0 0

none                    /dev/shm                tmpfs   defaults        0 0

none                    /proc                   proc    defaults        0 0

none                    /sys                    sysfs   defaults        0 0

LABEL=/var              /var                    ext3    defaults        1 2

LABEL=SWAP-sda2         swap                    swap    defaults        0 0

/dev/sdb                /oradata                ocfs2   _netdev,datavolume,nointr 0 0

/dev/hdc                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0

 

 

interconnect 살림

시간이 지난 후 mount -a 아니면 수동 명령 수행

 

[root@idb2 ~]# mount -a

[root@idb2 ~]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda1              17G  7.7G  8.0G  49% /

none                  754M     0  754M   0% /dev/shm

/dev/sda3             373M   92M  262M  27% /var

/dev/sdb              9.8G  1.4G  8.5G  14% /oradata

 

잠시 후 CRSD start 자동 됨. 하지만 vip는 넘어가도 서비스는 다시 복구되지 않음.

idb2에 접속해도 web intra 서비스로 접속 불가

 

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb1   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb1   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

intra 서비스를 idb2로 다시 복구시킴

 

[oracle@idb2 ~]$ srvctl relocate service -d idb -s intra -i idb1 -t idb2

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb1   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

##################################### master node 전원 off #######################################

현재 master node idb1

 

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb1   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb1   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb1   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

파워 off 진행

 

idb2 ocssd.log

 

[    CSSD]2013-06-28 14:42:12.569 [213005] >TRACE:   clssnmPollingThread: node idb1 (1) missed(59) checkin(s)

[    CSSD]2013-06-28 14:42:13.571 [213005] >TRACE:   clssnmPollingThread: node idb1 (1) is impending reconfig

[    CSSD]2013-06-28 14:42:13.571 [213005] >TRACE:   clssnmPollingThread: Eviction started for node idb1 (1), flags 0x000f, state 3, wt4c 0

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmDoSyncUpdate: Initiating sync 4

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2013-06-28 14:42:13.571 [245775] >TRACE:   clssnmSendSync: syncSeqNo(4)

[    CSSD]2013-06-28 14:42:13.571 [131080] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[idb2] seq[1] sync[4]

[    CSSD]2013-06-28 14:42:13.571 [131080] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmWaitForAcks: node(1) is expiring, msg type(11)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmDoSyncUpdate: node(0) missCount(734) state(0)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmDoSyncUpdate: node(1) missCount(60) state(3)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmSendVote: syncSeqNo(4)

[    CSSD]2013-06-28 14:42:13.572 [131080] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(4)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(0)

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[    CSSD]2013-06-28 14:42:13.572 [245775] >TRACE:   clssnmCheckDskInfo: node(1) timeout(58670) state_network(0) state_disk(3) missCount(60)

[    CSSD]2013-06-28 14:42:13.643 [16384] >USER:    NMEVENT_SUSPEND [00][00][00][06]

[    CSSD]2013-06-28 14:42:14.573 [245775] >TRACE:   clssnmCheckDskInfo: node(1) timeout(59680) state_network(0) state_disk(3) missCount(61)

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmEvict: Start

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmEvict: Evicting node 1, birth 1, death 4, killme 1

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmEvict: Evicting Node(1), timeout(60000)

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmSendShutdown: req to node 1, kill time 659124

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmDiscHelper: node idb1 (1) connection failed

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmWaitOnEvictions: Node(1) down, LATS(599124),timeout(60000)

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2013-06-28 14:42:14.896 [245775] >TRACE:   clssnmSendUpdate: syncSeqNo(4)

[    CSSD]2013-06-28 14:42:14.897 [245775] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmDeactivateNode: node 0 () left cluster

 

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (1372388811/1372388811) prevConuni(1372388811) birth (1/0) (old/new)

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmDeactivateNode: node 1 (idb1) left cluster

 

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1372397396/1372397396) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2013-06-28 14:42:14.897 [131080] >USER:    clssnmHandleUpdate: SYNC(4) from node(2) completed

[    CSSD]2013-06-28 14:42:14.897 [131080] >USER:    clssnmHandleUpdate: NODE 2 (idb2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2013-06-28 14:42:14.897 [131080] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2013-06-28 14:42:14.897 [245775] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2013-06-28 14:42:14.897 [245775] >TRACE:   clssnmDoSyncUpdate: Sync Complete!

[    CSSD]2013-06-28 14:42:14.969 [278544] >TRACE:   clssgmReconfigThread:  started for reconfig (4)

[    CSSD]2013-06-28 14:42:14.969 [278544] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2013-06-28 14:42:14.970 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2013-06-28 14:42:14.970 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(1/1)

[    CSSD]2013-06-28 14:42:14.970 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_idb type 3

[    CSSD]2013-06-28 14:42:14.970 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_1_idb) birth(1/1)

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_idb type 2

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_1_idb) birth(1/1)

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_idb type 2

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_idb type 3

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock DBIDB type 2

[    CSSD]2013-06-28 14:42:14.971 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(DBIDB) birth(1/1)

[    CSSD]2013-06-28 14:42:14.972 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock DGIDB type 2

[    CSSD]2013-06-28 14:42:14.972 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(DGIDB) birth(1/1)

[    CSSD]2013-06-28 14:42:14.972 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock IGIDBALL type 2

[    CSSD]2013-06-28 14:42:14.972 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(IGIDBALL) birth(1/1)

[    CSSD]2013-06-28 14:42:14.973 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2

[    CSSD]2013-06-28 14:42:14.973 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(DAALL_DB) birth(1/1)

[    CSSD]2013-06-28 14:42:14.973 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2

[    CSSD]2013-06-28 14:42:14.973 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(1/1)

[    CSSD]2013-06-28 14:42:14.975 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2

[    CSSD]2013-06-28 14:42:14.975 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(1/1)

[    CSSD]2013-06-28 14:42:14.977 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_idb1 type 3

[    CSSD]2013-06-28 14:42:14.977 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_idb1) birth(1/1)

[    CSSD]2013-06-28 14:42:14.979 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_idb2 type 3

[    CSSD]2013-06-28 14:42:14.980 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2

[    CSSD]2013-06-28 14:42:14.980 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(1/1)

[    CSSD]2013-06-28 14:42:14.987 [278544] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2

[    CSSD]2013-06-28 14:42:14.988 [278544] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(1/1)

[    CSSD]2013-06-28 14:42:14.994 [278544] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 4

[    CSSD]2013-06-28 14:42:14.996 [196620] >TRACE:   clssgmPeerDeactivate: node 1 (idb1), death 4, state 0x80000000 connstate 0xa

[    CSSD]2013-06-28 14:42:14.996 [196620] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2013-06-28 14:42:14.996 [278544] >TRACE:   clssgmEstablishMasterNode: MASTER for 4 is node(2) birth(3)

[    CSSD]2013-06-28 14:42:14.996 [278544] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs

[    CSSD]2013-06-28 14:42:14.999 [278544] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2013-06-28 14:42:15.007 [278544] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 4 with 1 nodes

 

[    CSSD]CLSS-3001: local node number 2, master node number 2

 

masternode idb2로 변경됨.

 

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb2   

ora.idb.idb1.inst                             ONLINE     OFFLINE          

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb2   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb2   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     OFFLINE          

ora.idb1.gsd                                  ONLINE     OFFLINE          

ora.idb1.ons                                  ONLINE     OFFLINE          

ora.idb1.vip                                  ONLINE     ONLINE on idb2   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

vip와 서비스가 idb2로 넘어감.

 

idb1의 전원 on

 

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb2   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb2   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb2   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

최종으로 위와 같은 형태가 됨. 서비스는 자동 복구되지 않음

 

커맨드를 통해 서비스 이동

 

[oracle@idb2 ~]$ srvctl relocate service -d idb -s web -i idb2 -t idb1

[oracle@idb2 ~]$ crsstat

HA Resource                                   Target     State            

-----------                                   ------     -----            

ora.idb.db                                    ONLINE     ONLINE on idb2   

ora.idb.idb1.inst                             ONLINE     ONLINE on idb1   

ora.idb.idb2.inst                             ONLINE     ONLINE on idb2   

ora.idb.intra.cs                              ONLINE     ONLINE on idb2   

ora.idb.intra.idb2.srv                        ONLINE     ONLINE on idb2   

ora.idb.web.cs                                ONLINE     ONLINE on idb2   

ora.idb.web.idb1.srv                          ONLINE     ONLINE on idb1   

ora.idb1.LISTENER_IDB1.lsnr                   ONLINE     ONLINE on idb1   

ora.idb1.gsd                                  ONLINE     ONLINE on idb1   

ora.idb1.ons                                  ONLINE     ONLINE on idb1   

ora.idb1.vip                                  ONLINE     ONLINE on idb1   

ora.idb2.LISTENER_IDB2.lsnr                   ONLINE     ONLINE on idb2   

ora.idb2.gsd                                  ONLINE     ONLINE on idb2   

ora.idb2.ons                                  ONLINE     ONLINE on idb2   

ora.idb2.vip                                  ONLINE     ONLINE on idb2   

 

 

Posted by neo-orcl
,