Rekey causes VPN tunnel to stop sending network traffic

Hello everybody, I’m having a weird issue with VPNs between a Palo Alto Cloud Firewall (PanOS9.1.3h) and Cisco Meraki Z3.All VPN Tunnels are established propely, but after a random period of time during the rekey step, a tunnel stays online, but network traffic can’t be send anymore. We are currently having 5 of these connections

I was able to capture a log, but I’m not able to troubleshoot it. Did some anonymization, see link attached. LOG

On the Meraki site/log, you can see the there are two steps happening repeatedly on a working tunnel.

inbound CHILD_SA

outbound CHILD_SA

At the time the error occurs, the outbound step is missing.

Any ideas?

make the timeouts the same on both sides.

enable ike dead peer detection on the palo side

https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClFaCAK

then follow this to check for dpd mismatch

let us know if any progress… last i saw meraki was still doing ikev1? still the case?

Curious if you came up with a resolution? Facing the same issue between PAN and Meraki, started randomly.

Hello and thanks for your reply.

Meraki is doing IKEv2 now. All tunnels are setup with the same settings. There is no DPD option, but I assume this is the Liveness Check option now and it was already turned on.

I also have a monitor running, pointing at the local GW of the peer’s LAN.

We have a NAT scenario on all sites where the Z3 are installed. Public static address with common router. Z3 is connected to this router.

On Palo side

IPSec Crypto profile

IPSec Protocol ESP

DH group 2

LT 1h

Encryption aes-256-gcm/cbc

Authentication

sha256

IKW Crypto profile

DH Group

group2

Encryption

aes-256-cbc

Authentication

sha 256

Key LT 8h

IKEv2 Authentication Multiple 5

On Meraki side

Phase1

Encryption

AES 256

Authentication

SHA256

Pseudo-random Function

Defaults to Authentication

Diffie-Hellman group

2

Lifetime (sec)

28800

Phase2

Encryption

AES 256

Authentication

SHA256

PFS group

2

Liftime (sec)

3600

Palo Alto IKE GW Options

Passive mode Enabled

NAT-T Enabled

Advanced Option

Strict Cookie Validation turned off

Liveness Check

Interval (sec) 5

In the end I updated the Palo to a higher firmware and the Z3 to the latest beta which stopped the permanent disconnects. Got 25 Tunnels running now and I have to restart only 1 or 2 per week.

for testing i would try weakening PFS to off and sha256 to sha1

we see this more often than not between two vendors fix vpn issues; if it does fix the issue definitely open up tickets with both vendors

also, as another thing to check make sure the proxy-id’s are mirroring each other and one side is not sending more proxy-id’s than the other side is expecting. we have seen this lead to problems with asa to palo.

thanks