Using AI to turn 40,000 Unstructured Products Into a Navigable, SEO-Ready Catalog

We built an AI-driven attribute extraction system that processes tens of thousands of unstructured product listings, cross-referencing official documentation and established retailer catalogs to generate clean, import-ready attribute data for categories where manual classification was impossible, and ERP/PIM data didn’t exist.

They transformed SEO into our most effective acquisition channel, driving substantial year-over-year organic growth and helping us support half a million monthly visitors across our vast 170,000-item catalog.

Client

ePlaneta

Expertise
  • AI-Powered Data Extraction
  • Product Attribute Modeling
  • Faceted Navigation Architecture
  • Agentic Validation Systems
  • Large-Scale Data Normalization
  • eCommerce SEO Infrastructure
Year

2025

ePlaneta is one of Serbia’s leading online retailers, offering over 170,000 products across 2,000+ categories. With around 1,500,000 monthly visitors, the platform serves a wide range of customer needs. As part of an ongoing SEO partnership, we had already developed a semi-automated faceted navigation system capable of programmatically generating thousands of subcategories from attribute-category combinations – a proven driver of long-tail organic visibility. But the system had a critical dependency as it only works when products have clean, complete attribute data. For many of ePlaneta’s largest categories, that data simply didn’t exist.

Client words

For the past three years, Granular Group has been ePlaneta’s go-to partner for scaling our digital presence. They transformed SEO into our most effective acquisition channel, driving substantial year-over-year organic growth and helping us support half a million monthly visitors across our vast 170,000-item catalog. Their strategic guidance fueled rapid expansion in key verticals such as household appliances, IT hardware, footwear, and apparel, solidifying our leadership in each.

Their greatest impact has been in empowering us with clear, actionable insights. They built a bespoke controlling system that tracks performance at the product, category, and keyword level, surfacing opportunities and flagging issues the moment they arise. This real-time visibility has enabled our teams to optimize campaigns swiftly, fine-tune content, and maintain steady conversion gains across the board.

With their proactive, hands-on collaboration, Granular feels like an extension of our team. They navigate cross-departmental hurdles with ease, continuously unlocking new revenue streams and ensuring ePlaneta is poised for sustainable growth in Serbia’s competitive eCommerce market.

Branimir Kulašević, Chief Ecommerce Officer

Challenges

  • Faceted Navigation Requires Data That Doesn’t Exist

    The semi-automated faceting system we built for ePlaneta can programmatically generate thousands of subcategories from attribute-category combinations but only if products have clean, structured attributes in the backend. For many of ePlaneta’s highest-volume categories, the ERP and PIM contained no usable attribute data. Products were listed with free-text names and nothing else. The faceting engine was ready; the data to feed it was not. This wasn’t an isolated gap, it was a systemic issue affecting categories across the catalog wherever supplier data was incomplete, inconsistent, or simply missing.

  • Product Naming Was Inconsistent and Unreliable

    Even product names, the one piece of data that always existed couldn’t be trusted as an attribute source. Naming conventions varied wildly across suppliers: some used standardized identifiers, others used internal codes, some mixed languages, and many embedded multiple attributes into a single unstructured string. The same product characteristic might appear in dozens of different formats depending on who supplied the listing. Typos, abbreviations, and idiosyncratic formatting made simple string parsing unreliable at scale.

  • Mixed Product Types Within Single Categories

    Large categories often contained fundamentally different product types lumped together with no classification. Without structured type attributes, these couldn’t be separated into meaningful subcategories, and each type often required different faceting logic and different attribute structures. The lack of type-level classification compounded every downstream problem: faceting, filtering, and SEO targeting all depended on distinctions that didn’t exist in the data.

  • Platform Display Limitations Amplified the Problem

    Magento imposed a hard limit of 10,000 products per paginated category view. Any category exceeding this threshold had products that were simply invisible to browsing users. This made faceted subcategories not just an SEO advantage but a product discoverability necessity, without attribute-driven navigation, large portions of inventory could never be found through normal browsing.

  • Manual Attribution Was Not Viable

    With tens of thousands of products requiring attribute assignment across multiple categories, manual research and classification was not a realistic approach. Even if completed, the result would immediately begin degrading as new products entered the catalog. Any solution needed to handle both the existing backlog and the ongoing flow of new, unstructured listings.

  • The Smartphone Accessories Category Crystallized the Problem

    One category made the scale of the challenge impossible to ignore: smartphone cases, covers, screen protectors, and foils- over 40,000 products with zero structured attributes. No brand, no model, no product type. Analysis revealed approximately 25,000 cases, 10,000 pouches, 4,600 sleeves, and 4,400 screen protectors all mixed into a single category. Product names contained model identifiers in inconsistent formats, product line names that resembled model names, and multi-device listings that defied simple classification. This category became the proving ground for the AI system. If it could solve attribution here, it could work across the catalog.

Solution

  • AI-Powered Attribute Extraction Pipeline

    We developed an automated system that processes product listings through an AI pipeline to identify and extract structured attributes. Rather than relying solely on parsing the product name string, which had proven unreliable due to naming inconsistency, the system searches for official product documentation and manufacturer specifications, cross-referencing each product against the structured catalogs of established, well-organized retailers. This approach treats well-attributed competitor catalogs as a reference standard, using them to resolve ambiguities that the product name alone cannot. The pipeline is category-agnostic by design: the same architecture handles phone accessories, electronics peripherals, or any other category where attribute data is
    missing.

  • Reference-Based Attribute Normalization

    Raw attribute identification is only half the problem. The same characteristic can appear in dozens of naming variations across suppliers abbreviated, expanded, with or without qualifiers, or split across identifiers that consumers treat as equivalent. The system normalizes all extracted attributes against a controlled vocabulary per category, grouping variations into standardized identifiers. In the smartphone accessories category, for example, this reduced approximately 830 raw model variations to 234 normalized models, each consistently named and ready for faceted navigation. The normalization logic is guided by both search demand data (how users actually search) and faceting practicality (ensuring each subcategory contains enough products to be useful).

  • Agentic Validation Guardrails

    AI extraction at this scale inevitably produces errors, hallucinated identifiers, misclassified attributes, and false matches where a product line name resembles a valid attribute value. We implemented agentic validation layers that cross-check extracted attributes against known reference databases, flag low-confidence extractions for manual review, and verify that the identified attribute combinations correspond to real, valid entities. Products that cannot be attributed with sufficient confidence are excluded from the output rather than allowed to introduce errors into the faceting system. This guardrail architecture is what makes the difference between a bulk AI output with unpredictable quality and a production-grade dataset suitable for import.

  • Product Type Classification

    Before attribute extraction, the system classifies products within a category into their correct types based on name analysis and product characteristics. This enables type-specific processing, different product types exhibit different naming patterns and require different attribute structures and supports the creation of parallel faceted navigation hierarchies where needed. In practice, this also resolves strategic category architecture questions: which types should be grouped together and which warrant separate faceting structures.

  • Import-Ready Output Format

    The final output is a structured document ready for direct import into the eCommerce backend, each row containing the product URL, product name, SKU, and the relevant normalized attribute facets. No intermediate transformation or manual formatting is required between the pipeline output and the Magento import, eliminating friction from every batch.

  • Scalable Across the Catalog

    The system was designed not as a one-time data cleanup for a single category but as a reusable pipeline. The core extraction, validation, and normalization architecture remains consistent across categories, while the reference catalogs, normalization rules, and attribute taxonomies are reconfigured per category. This means faceted navigation expansion no longer depends on complete ERP data as a prerequisite. Any category with an attribute gap can be processed through the pipeline and brought into the faceting system.

Results

  • Scalable Attribution System What was previously a manual, category-by-category data problem now has a repeatable, AI-driven solution that can be applied across ePlaneta's 2,000+ category catalog, processing new categories and ongoing inventory additions without manual research.
  • Hundreds of New Subcategory Pages Each processed category generates a structured set of faceted subcategories that immediately become targetable for long-tail SEO, while simultaneously solving the browsing and discoverability problems created by oversized, unfiltered category pages.
  • Faceted Navigation Unblocked at Scale The dependency that had limited faceting to categories with clean ERP data has been removed. The system now enables faceted navigation for any category in the catalog, regardless of the state of supplier-provided data.
Faceting Dependency Audit

We mapped the full catalog to identify which categories had sufficient attribute data for faceted navigation and which were blocked by data gaps. This produced a prioritized backlog of categories where the AI system would deliver the highest impact weighted by product count, SEO potential, and the severity of the existing UX and discoverability problems.

Pipeline Architecture Development

We built the core AI extraction system: for each product, the pipeline searches for official documentation and cross-references against well-structured retailer catalogs to identify the correct attributes. The architecture was designed to be category-agnostic from the start, with configurable layers for attribute type, reference sources, and normalization rules.

Agentic Validation Layer Implementation

We added the guardrail system that verifies extracted attributes against known reference databases, flags hallucinations and low-confidence matches, and routes uncertain products for exclusion or manual review. This layer was calibrated iteratively: initial runs revealed specific failure patterns like product line names mistaken for attributes, multi-product listings causing misattribution, inconsistent naming formats confusing the extraction which were then addressed with targeted validation rules.

Proving Ground: Smartphone Accessories

The 40,000-product smartphone accessories category served as the initial test. We processed the category in batches — starting with 7,500 products to validate the approach, then scaling to the full inventory. The pipeline classified products by type, extracted brand and model attributes, normalized 830 raw model variations to 234 standardized identifiers, and delivered 27,000 attributed products. Approximately 5,000 products that couldn’t be attributed with sufficient confidence were excluded rather than imported with uncertain data.

Normalization and Faceting Integration

For each processed category, the normalized attribute output is structured to map directly to ePlaneta’s faceted navigation system. The normalization decisions balance search demand (how users search for these attributes), faceting practicality (minimum product count per subcategory), and category architecture (which attribute combinations warrant dedicated pages). The output integrates seamlessly with the existing semi-automated faceting engine.

Ongoing Category Expansion

With the pipeline proven and the architecture established, we apply it to additional categories on a rolling basis. Each new category requires reconfiguration of reference sources, attribute taxonomies, and normalization rules, but the core extraction and validation infrastructure carries over, making each subsequent category faster to process than the last.