Cisco recently issued an update regarding memory failures in Nexus 9300 Switches and APIC Servers. Here is a brief overview of the issue and a link to the complete Field Notice from Cisco.
Problem Description:
A limited number of dual in-line memory modules (DIMMs) shipped from Cisco are impacted by a known deviation in the memory supplier's manufacturing process. This deviation can result in a higher-than-expected rate of failure.
Products Affected:
Problem Symptom
Most DIMMs with this manufacturing deviation will exhibit persistent correctable memory errors. If left untreated, the DIMMs can eventually encounter an uncorrectable memory event. If encountered during runtime, uncorrectable errors will cause an unexpected switch reset.
Various DIMM Reliability, Availability, and Serviceability (RAS) features or even operating system features can mask the extent of these correctable errors.
Solution
Customers should replace the hardware DIMM to avoid the potential for unexpected switch/server failure.
For more information or to identify affected products please use the Serial Number Validation Tool at https://www.cisco.com/c/en/us/support/docs/field-notices/724/fn72464.html
Comments