Compatible with SAP  ·  IBM Maximo  ·  Oracle ERP  ·  Hexagon EAM  ·  Infor  ·  Any CMMS — Run an Industrial IQ diagnostic →
PartsCleanse AI Methodology — How MRO Catalog Deduplication Works Without False Positives

Why false-positive control is the hardest — and most important — part of MRO deduplication.

PartsCleanse AI is not a generic fuzzy matcher. It is a two-stage industrial catalog diagnostic engineered to surface duplicate records while protecting operations from unsafe consolidation decisions.

The assurance thesis

Finding similar descriptions is the easy part. The hard part is knowing when NOT to merge.

A 2-inch gate valve and a 4-inch gate valve share most of their description words. A bearing in steel and a bearing in bronze look nearly identical in text. In a generic fuzzy matcher, both pairs would be flagged as duplicates. In an operating environment, consolidating them would be a maintenance and procurement error.

PartsCleanse AI applies discriminator penalties after fuzzy scoring -- not instead of it. Size, pressure rating, material family, model number, functional subtype, and pack-vs-each conflicts all reduce the confidence score of otherwise similar pairs, routing them to a specialist review backlog rather than an automatic consolidation queue.

The output is tiered by design. Tier 1 accelerates obvious consolidation candidates. Tiers 2 and 3 create a governed review queue for items that require engineering, procurement, or master-data authorization before any ERP action is taken.

The non-negotiable client safety standard
No item master record should be retired from ERP solely on algorithmic output. The engine provides evidence and a recommendation. The client review process -- with named owners and authorization tiers -- is what authorises the action.
Tier 1High confidence -- accelerate for ERP review and consolidation
Tier 2Probable duplicate -- procurement and engineering sign-off required
Tier 3Possible duplicate -- specialist authorization before any ERP action
The six method controls

Every layer of the engine is a governance decision.

Control

Input profiling

Input profiling is the process by which Engine 1 classifies the catalog as abbreviated, verbose, mixed, or clean before any duplicate scoring begins. This classification determines which normalization rules are applied, ensuring raw SAP abbreviations and verbose descriptions are treated differently.

Control

Adaptive normalization

Adaptive normalization converts concatenated units, material grade variants, manufacturer name aliases, and MRO-specific abbreviations into a standardized form before comparison. Without this step, identical parts described in different conventions would not match.

Control

Weighted blocking

Weighted blocking uses TF-IDF scoring to group candidate pairs by shared significant tokens, reducing unnecessary comparisons while preserving rare diagnostic terms -- such as pressure classes and model suffixes -- that carry high discriminating value.

Control

Composite scoring

Composite scoring combines description similarity, token overlap, manufacturer part-number agreement, manufacturer alias resolution, and unit-of-measure alignment into a single confidence score. No single signal alone determines a duplicate finding.

Control

Critical discriminators

Critical discriminators are size, pressure rating, material family, part category, functional subtype, model number, and pack-vs-each unit conflicts that penalize otherwise similar pairs. A 2-inch valve and a 4-inch valve share most description words but are not interchangeable -- the discriminator prevents an unsafe match.

Control

Governed output

Governed output means findings are confidence-tiered into three review bands rather than delivered as automatic ERP deletion instructions. Tier 1 accelerates obvious duplicates. Tiers 2 and 3 create an engineering and procurement review backlog for items requiring specialist authorization.

Domain-specific depth

Why generic deduplication tools fail in industrial MRO environments.

Industrial catalog complexity

SAP 40-character description limit

SAP MM material master descriptions are truncated to 40 characters in many legacy implementations. This means "HIGH PRESSURE GATE VALVE 4IN 316SS" becomes "HP GATE VLV 4IN 316SS" — a description that a generic text model will not match to its verbose equivalent without industrial abbreviation normalization.

False-positive risk taxonomy

Critical discriminator classes

Industrial IQ identifies seven classes of discriminators that must be preserved to prevent unsafe consolidation: nominal size, pressure rating, material family, functional subtype, model number, unit-of-measure (pack vs. each), and manufacturer specificity. Missing any one of these produces technically plausible but operationally dangerous merge recommendations.

ERP migration risk

Why catalog defects compound in migrations

SAP S/4HANA and Oracle Cloud migrations create a compression deadline for catalog quality. Item masters migrated with duplicate records propagate the disorder into the new system with a clean data-load timestamp — resetting audit trail visibility and making post-cutover remediation significantly more expensive than pre-migration deduplication.

"The engine is not trying to find similar descriptions. It is trying to find descriptions that are similar enough to be the same part, while being different enough in the ways that matter to maintenance and procurement."
PartsCleanse AI Design Principle — Discrimination before consolidation
AI2COE Copilot