Oracle Data Guard 19c: rfs (PID:11969): Possible network disconnect with primary database

One of the essential things in the configuration of Oracle Data Guard and Oracle Data Guard Broker is where the file tnsnames.ora is placed, which by default, from some other early releases of 19c it has been defined at $ORACLE_HOME/network/admin. However, in one of the environments that I was providing some job as a consultant, I could see the following outputs from the Oracle Data Guard broker utility :

Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/xxxxxxxxxxxxxxx@TECHDBP
Connected to "TECHDBP"
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - BROKERTECH

  Protection Mode: MaxPerformance
  Members:
  TECHDBP - Primary database
    Error: ORA-16778: redo transport error for one or more members

    TECHSTDB   - Physical standby database
      Warning: ORA-16853: apply lag has exceeded specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 31 seconds ago)

DGMGRL> show database TECHDBP;

Database - TECHDBP

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    tech1
    tech2
      Error: ORA-16737: the redo transport service for member "TECHSTDB" has an error

Database Status:
ERROR

DGMGRL> show database TECHDBP "statusreport";
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
                 tech2      ERROR ORA-16737: the redo transport service for member "TECHSTDB" has an error

Just a brief explanation on the output above to make it easier to understand from now on:

Standby database has db_unique_name as : TECHSTDB

Primary database has db_unique_name as : TECHDBP

Configuration was named as: BROKERTECH

“tech1” and “tech2” are the name of the instances for the primary database TECHDBP .

And, with the output above was found the error was reporting on the instance “tech2” that it was having a problem to connect to the standby database showing “tech2 ERROR ORA-16737: the redo transport service for member “TECHSTDB” has an error.”

So then I started digging to find out what the reason was. Therefore, I connected to the server where the instance “tech2” was located and started to check the alertSID.log file as below:

]$ tail -f alert_bo1.log
2021-09-08T21:00:26.188702+02:00
 rfs (PID:7142): Selected LNO:22 for T-1.S-1101 dbid 1530008807 branch 1081336999
2021-09-08T21:00:26.193743+02:00
ARC3 (PID:29273): Archived Log entry 1310 added for T-1.S-1100 ID 0x5b31c1e7 LAD:1
2021-09-08T21:04:45.459693+02:00
 rfs (PID:11969): Possible network disconnect with primary database
2021-09-08T21:04:45.460291+02:00
 rfs (PID:11959): Possible network disconnect with primary database
2021-09-08T21:04:45.460422+02:00
 rfs (PID:30571): Possible network disconnect with primary database
 rfs (PID:7142): Selected LNO:21 for T-1.S-1102 dbid 1530008807 branch 1081336999
2021-09-08T21:43:36.737978+02:00

Thus, I could have a hint at the cause by the message “RFS (PID:11959): Possible network disconnect with primary database”. And, then more information from inside the database:

SQL> SELECT thread#, dest_id, gvad.status, error, fail_sequence FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;

   THREAD#    DEST_ID STATUS
---------- ---------- ---------
ERROR                                                             FAIL_SEQUENCE
----------------------------------------------------------------- -------------
         1          1 VALID
                                                                              0

         1          2 VALID
                                                                              0

         2          1 VALID
                                                                              0


   THREAD#    DEST_ID STATUS
---------- ---------- ---------
ERROR                                                             FAIL_SEQUENCE
----------------------------------------------------------------- -------------
         2          2 ERROR
ORA-12154: TNS:could not resolve the connect identifier specified             0

Consequently, I checked the tnsnames.ora and found out that the alias to the service that it was reporting error was added in the file and there was no issue on the listener.ora. So what could be? It was when I decided to setup the TNS_ADMIN variable as I found out that it was not configured as below:

trace]$ srvctl getenv database -d TECHDBP -t "TNS_ADMIN"
TECHDBP:
PRKF-1128 : Environment variable TNS_ADMIN is not defined.

You do not have to set up this variable in a server where you have only the database running. Still, after a while, I learned that this environment has many applications that occasionally were using another TNS_ADMIN to their purpose. Knowing that, I decided to setup the variable TNS_ADMIN to oracle user on the CRS of this Oracle RAC configuration as below:

~]$ srvctl setenv database -d TECHDBP -T "TNS_ADMIN=/u01/app/oracle/19.3.0/db/network/admin"

Following by a restart of the database:

 ~]$ srvctl stop database -d TECHDBP
 ~]$ srvctl start database  -d TECHDBP

And then … Voilà! After the configuration, the Data Guard Broker output was changed successfully, and it started to operate normally:

admin]$ dgmgrl
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Sep 8 21:52:48 2021
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/xxxx@TECHDBP
Connected to "TECHDBP"
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - BROKERTECH

  Protection Mode: MaxPerformance
  Members:
  TECHDBP - Primary database
    TECHSTDB   - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 33 seconds ago)

DGMGRL>

I hope that this post helps you!

Related posts

Leave a Comment