Boost Product Categorization in NestJS Using OpenAI
In modern applications, AI-based solutions can automate and enhance complex tasks, such as product categorization. By integrating AI into your backend systems, you can dynamically categorize products based on text, which can be particularly useful for e-commerce platforms, content aggregation, and other use cases where classification of items is critical.
In this article, I’ll walk through a use case where we categorize a list of words (representing products) into predefined categories using the OpenAI API and NestJS. We’ll also explore how to handle uncategorized products by allowing the AI to suggest new categories dynamically.
Why Use AI for Categorization?
Traditional approaches to categorizing products or content often involve manual work or static rules. These methods can be inefficient, especially when dealing with large, constantly changing datasets. AI offers a dynamic solution that adapts to the data and context, reducing manual work and increasing accuracy.
Here, OpenAI’s large language models, like Mistral-7B-Instruct, are utilized to categorize words into predefined categories. Additionally, if the model finds that a product does not fit into one of the existing categories, it can dynamically suggest a new category.
Problem Overview
Let’s assume we have several categories and words that represent various products or entities. Our task is to:
1.Categorize the words into the appropriate predefined categories.
2.If a word does not fit into any of the existing categories, create a new category based on the AI’s suggestion.
Initially, the implementation was structured in a way where the AI was queried in a loop, with each word being sent as a separate request to the API. This was inefficient because every request consumed an API call, leading to the exhaustion of limited trials very quickly.
Initial Approach: Looping Through Words
const categorizedWords = categories.reduce((acc, category) => {
acc[category] = [];
return acc;
}, {});
for (const word of words) {
const chatCompletion = await openai.chat.completions.create({
model: ‘mistralai/Mistral-7B-Instruct-v0.2’,
messages: [
{
role: ‘system’,
content: `You are an AI model trained to categorize items into the following categories: ${categories.join(‘, ‘)}. If an item does not fit any category, suggest a new one.`,
},
{
role: ‘user’,
content: `Which category does the word “${word}” fit into?`,
},
],
temperature: 0.7,
max_tokens: 128,
});
// Process response and categorize the word…
}
This code runs a loop for every word, querying the OpenAI API individually, which could quickly lead to a high number of API calls, exhausting the allowed quota.
Optimized Approach: Sending All Words in One API Call
To mitigate the problem of excessive API calls, we decided to optimize the code by sending all words in a single request, allowing the AI to categorize them all at once. This reduces the total number of API calls and preserves our API quota.
Here’s the optimized version:
The Optimized Code
import { Injectable } from ‘@nestjs/common’;
import { ConfigService } from ‘@nestjs/config’;
import { OpenAI } from ‘openai’;
@Injectable()
export class AiService {
constructor(private configService: ConfigService) {}
async categorizeWords(categories: string[], words: string[]): Promise<any> {
try {
const openai = new OpenAI({
apiKey: this.configService.get(‘AI_API_KEY’),
baseURL: this.configService.get(‘AI_URL’),
});
// Build a single prompt for all the words
const prompt = `You are an AI model trained to categorize items into the following categories: ${categories.join(‘, ‘)}. If an item does not fit any category, suggest a new one. Please categorize the following words: ${words.join(‘, ‘)}.`;
const chatCompletion = await openai.chat.completions.create({
model: ‘mistralai/Mistral-7B-Instruct-v0.2’,
messages: [
{
role: ‘system’,
content: prompt,
},
],
temperature: 0.7,
max_tokens: 512, // Increase to handle larger responses
});
// Extract and clean up the AI’s response
const aiResponse = chatCompletion.choices[0].message.content.trim();
// Parse AI response and categorize words
const categorizedWords = categories.reduce((acc, category) => {
acc[category] = [];
return acc;
}, {});
aiResponse.split(‘\n’).forEach(line => {
const [word, category] = line.split(‘:’).map(s => s.trim().toLowerCase());
if (categories.includes(category)) {
categorizedWords[category].push(word);
} else {
if (!categorizedWords[category]) {
categorizedWords[category] = [];
}
categorizedWords[category].push(word);
}
});
console.log(‘Categorized Words:’, categorizedWords);
return categorizedWords;
} catch (error) {
// Improved error handling
console.error(‘Error communicating with AIML API’, error);
if (error.response) {
console.error(‘Response data:’, error.response.data);
console.error(‘Response status:’, error.response.status);
} else if (error.request) {
console.error(‘Request data:’, error.request);
} else {
console.error(‘Error message:’, error.message);
}
throw new Error(‘Failed to categorize words’);
}
}
}
Explanation
1. Single Prompt Request: Instead of looping through each word, we send all words to the AI in one go using a single prompt. This dramatically reduces the number of API calls.
const prompt = `You are an AI model trained to categorize items into the following categories: ${categories.join(‘, ‘)}. If an item does not fit any category, suggest a new one. Please categorize the following words: ${words.join(‘, ‘)}.`;
2. Improved Token Management: We increased max_tokens to handle longer responses from the AI. This allows the model to return a larger number of tokens in its response, which is necessary when dealing with many words.
max_tokens: 512,
3. Response Parsing: After receiving the response, we split it by lines and process each word-category pair. This approach ensures that the AI suggests new categories if the words don’t fit into any of the predefined ones.
4. Error Handling: We added detailed error handling to capture and log any issues that occur during the API call. This will help to identify the source of the issue, whether it’s a problem with the request, the server response, or even connectivity.
Cleaning AI’s Response
When working with OpenAI responses, the output may sometimes include extraneous characters such as \n or spaces that need to be cleaned. This can be easily done using JavaScript string methods.
For example, after splitting the AI’s response:
aiResponse.split(‘\n’).forEach(line => {
const [word, category] = line.split(‘:’).map(s => s.trim().toLowerCase());
if (categories.includes(category)) {
categorizedWords[category].push(word);
} else {
if (!categorizedWords[category]) {
categorizedWords[category] = [];
}
categorizedWords[category].push(word);
}
});
The trim() method is used to remove unwanted spaces, while split() allows us to break down the response by words and categories.
Conclusion
By sending all words to the AI in a single API call, we drastically reduce the number of requests, optimize performance, and make the most out of limited API usage. This method also ensures that uncategorized products are dynamically assigned new categories, making the AI’s classification much more adaptable and intelligent.
This approach can be extended beyond product categorization to any content classification task where static rules are insufficient. By harnessing the power of AI in this way, you can build smarter, more efficient systems that scale with the complexity and size of your data.