A generic fuzzy text matcher applied to an MRO catalog will return impressive duplicate-detection numbers. It will find every bearing record that shares a manufacturer name and a nominal size description. It will flag every valve entry that shares a connection type and pressure rating keyword. And it will be wrong on a meaningful percentage of them in ways that create operational risk.

The industrial spare-parts catalog presents a text-matching challenge that generic similarity algorithms were not designed to solve. A 2-inch gate valve and a 4-inch gate valve share every keyword except one. A bearing with a 40mm bore and a bearing with a 45mm bore have identical descriptions except for a single numeric token. A 150-pound pressure flange and a 300-pound pressure flange are physically non-interchangeable despite sharing most of their description. When these pairs are scored purely on text similarity, they produce high confidence scores that an ERP deduplication workflow will act on.

The consequence in an industrial environment is not a data quality issue. It is a maintenance risk. If two valves of different sizes are consolidated into a single SKU, a maintenance order that requires the smaller size may receive the larger one. The installation fails. If it is a safety-critical system, the consequences escalate.

PartsCleanse AI applies discriminator penalties after the initial similarity scoring pass. The engine extracts size tokens, pressure class signals, material family indicators, model number components, functional subtypes, and commercial unit-of-measure values from each item description. When two items score highly on text similarity but conflict on any of these discriminator dimensions, the similarity score is penalized and the confidence tier is downgraded.

This means a 2-inch valve and a 4-inch valve with otherwise identical descriptions will not appear in Tier 1. They will appear in Tier 2 or Tier 3 — the specialist-review tiers — with a note on the dimension conflict. A maintenance engineer reviewing the finding will see the size discrepancy explicitly and can make an informed consolidation or separation decision.

This is why the tiered report structure is not a user-experience feature. It is the governance mechanism. Tier 1 enables operations to move quickly on obvious duplicates. Tiers 2 and 3 create a controlled review backlog for findings where rapid consolidation would be inappropriate. The goal is not to maximize the duplicate count. It is to produce a finding that can be acted on safely.

This analysis supports the PartsCleanse AI diagnostic thesis: quantify the problem first, govern the review, then scale. The AI Adoption Framework defines the full six-stage governance sequence.