A German RAG based Product Advisor — Leveraging the Power of LLMs in Sales Pipelines
The artificial intelligence-based digital employee SUSI from SUSI & James GmbH takes over and optimizes a wide variety of communication-heavy business processes in companies. SUSI ensures 24/7 that no call goes unanswered and takes all calls. With its support, SUSI creates space and time for service-intensive issues. Employees can once again concentrate fully on value-adding activities. In this article, we highlight the use of SUSI as a Digital Product Advisor. We show how to combine traditional NLP with the power of LLMs and Generative AI in a way that ensures the best possible customer experience. The following demo is about a consultation between a digital employee and a user who is interested in buying a new car. SUSI guides the user through a three-step process in which general information about the desired vehicle can first be provided. Then, based on a suggested selection of eligible vehicles, the user can learn more about these vehicles by asking specific questions. Once the user has decided on one or more vehicles, the last step is to arrange a test drive or a purchase appointment. Step 0 — Preparing the data and make some basic considerations The basis for product consulting are products, in this case about 7000 current vehicles, which are available in JSON format. (Displayed simplified in this example){„name“: „A3“,„car_model_id“: 82,„car_trim_id“: 25295,„car_series_id“: 2901,„car_production_year“: „2016“,„car_series_name“: „Sportback Schrägheck 5-Tür“,„car_trim_name“: „1.2 TFSI MT (105 ps)“,„car_specifications“: {„Anzahl der Gänge“: „6“,„Schadstoffeinstufung“: „EURO V“,„Fahrzeugart“: „Schrägheck“,„Sitzplätze“: „5“,„Breite“: „1785“,„Radstand“: „2636“,„Spur vorne“: „1535“,„Kofferraumvolumen minimal“: „380“,„Kofferraumvolumen maximal“: „380“,„Spur hinten“: „1506“,„Kraftstoffe“: „Benzin“,„Hubraum“: „1197“,„Wendekreisdurchmesser“: „10.9“,„Zuladung“: „485“,„Anhängelast (gebremst)“: „3240“,„Ladehöhe“: „677“},„car_options“: {„Attraction“: [„Elektromechanische Servolenkung mit variabler Kraft auf das lenkrad“,„Servotronic (aktives lenkrad)“,„Schallschutz Verglasungen“,„Start-stop-System“,„Mechanische Vordersitze einstellen“,],„Ambiente“: […],„Ambition“: […]},„manufacturer“: „Audi“} When developing a Retrieval or Retrieval Augmented Generation (RAG) approach for a dataset containing detailed car (or any other product) information, it’s crucial to consider various factors to ensure an effective and accurate system. Here’s a breakdown of the aspects to consider: Understanding the Data Structure Understanding the dataset’s structure is foundational. The given dataset is in a structured format with nested fields and especially mixed data types, which requires a clear understanding to accurately retrieve or generate the necessary information. Special consideration should be given to the fact that, depending on the heterogeneity of the data points, aspects such as the presence of contextual information, digits, and mixed data must be taken into account, especially when vectorizing. Indexing Strategy Proper indexing is crucial for effective retrieval. Consider using a robust indexing strategy to ensure quick and accurate retrieval of car details based on various parameters such as car model, production year, or manufacturer. Query Formulation The query formulation should be designed to handle a variety of search terms and parameters. It should be robust enough to accommodate different types of queries and return relevant results. Retrieval-Augmented Generation For RAG, integrating retrieval mechanisms with generation models requires a seamless flow of information. Ensure that the retrieval process can feed relevant data to the generation model to facilitate meaningful output. Special attention should also be paid here to the question of the target language, since capable LLMs do not function well multilingually in every case. Evaluation Metrics Establishing evaluation metrics is necessary to measure the performance of the retrieval and generation systems. Metrics could include accuracy, recall, precision, and F1 score among others. To perform a qualitative analysis of the entire scope, individual metrics are usually inadequate. Frameworks like Ragas can fill this gap. Scalability The system should be designed to handle scalability to accommodate growing data or increased query loads without compromising performance. The Haystack framework and the SmartOffice form the basis for the above steps. In the example shown, it quickly becomes clear that an initial selection of vehicles to be proposed can and should be based on a number of metadata such as manufacturer, fuel consumption and number of seats. Since a classic document is represented in Haystack as follows:class Document:content: Union[str, pd.DataFrame]content_type: Literal[„text“, „table“, „image“]id: strmeta: Dict[str, Any]score: Optional[float] = Noneembedding: Optional[np.ndarray] = Noneid_hash_keys: Optional[List[str]] = None the question quickly arises as to which JSON content is defined as content on which textual embeddings will later also be generated and which data will become part of the meta-data. As for the choice of database (or DocumentStore), we compared Weaviate and ElasticSearch. Both systems offer similar advantages and disadvantages, especially the advanced filter logic in the case of ElasticSearch tipped the scales for prototyping. Step 1 — Designing Prompts for Basic Retrieval In this step, the focus is on the question of how to derive and extract from almost arbitrary natural language statements the criteria on which meaningful filtering can be performed. The statement pictured in SUSI’s introduction translates as: „I’d like a BMW or an Audi for a family of five with a dog.“ The filter we need to construct from this in Haystack-compatible format looks like this:{„$and“: {„number_of_seats“: {„$eq“: 5},„trunk_volume_minimum“: {„$gte“: 400},„manufacturer“: {„$in“: [„BMW“,„Audi“]}}} For this problem, we evaluated several approaches, including: Text2SQL approach which refers to a type of natural language processing (NLP) approach that translates natural language queries into SQL (Structured Query Language) queries. Span extraction approach which refers to a task in Natural Language Processing (NLP) where the goal is to identify and extract a specific portion of text, referred to as a “span,” from a larger body of text based on a given query or condition. Generative approach which refers to the assumption that large language models given a specifying prompt have sufficient knowledge about identifying and extracting specific information from a given text to be able to satisfy the aforementioned requirement in the form of generated output. After a short prototyping phase of the mentioned approaches, we decided to use the generative approach. The shortlisted models were: 1. GPT-42. Falcon-180B3. LeoLM-13b After a prompt engineering phase, the following prompt proved to be a good fit (partially shown here):I need you to create a json serialized python dictionary as string that is derived from a user input. In the user input there are filter criteria that can be expressed in different ways. The user can express himself on the following parameters: [‚vehicle_type‘, ’seats‘, ‚width‘, ‚length‘, ‚height‘, ‚trunk_volume_minimum‘, ‚fuels‘, ‚engine_volume‘, ‚engine_power‘, ‚gasoline_grade‘, ‚transmission‘, ‚fuel_consumption_combined_at_100_km‘, ‚total_weight‘, ‚manufacturer‘, ‚vehicle_name‘]. The following logical operators are available: $and, $or The following comparison operators are available: $eq, $in, $gt, $gte, $lt, $lte The GPT-4 and Falcon-180B models have shown