Is it possible to stop indexing products that are published, but not actually reachable on the site?
Let’s say we have a product that is published on all channels, but it’s main category is a category that is not published or is missing an url. This product is still indexed, but not included in the result when transforming the search hits.
If we have 16 products per page, and one of those first 16 products is a “bad” product, we only get 15 products. If you remove the product from the unpublished category it works as expected.
I tried to do this in ProductIndexDocumentBuilder, but couldn’t figure out how.
When I tested (on version 7.7.1) I could only reproduce the issue when the url-field was empty for a category, when I unpublished the category the products were removed from the listing.
If you edit CreateModelPerChannel() In ProductIndexDocumentBuilder you can make the adjustment below to filter out channels where there is no url available to the main category:
...
if (item.ProductLink.MainCategory)
{
mainCategorySystemIds[item.Category.AssortmentSystemId] = item.Category.SystemId;
// Add this line..
channels = channels.Where(c => !string.IsNullOrEmpty(item.Category.GetUrl(c))).ToList();
}
...
But I would recommend trying another solution if possible. The issue is that if you later add a url to a category it will not trigger a index-rebuild for all products in any nested category so the products will not appear on the site until a manual index rebuild is done.
A better solution is probably to add a validation so that a category cannot be added without url.