Knowledgebase Encoding Issues

Modified on Thu, 12 Dec at 2:06 PM

AI Knowledgebase Encoding


The Talkative Knowledgebase feature can scrape websites for content and allows for files to be uploaded to be utilised by the AI tool. Files which are uploaded should be encoded using the UTF8 encoding.


Encoding Failure Emails


When we refresh a knowledgebase, we generate embeddings from your data for the search tool to use. Embeddings must be valid UTF8 encoded content and will fail if any non-compliant characters are detected.

If you have a UTF8 encoded file and you are receiving an error message indicating there is an encoding error, this will most likely be due to a special character which was pasted into the text document.

When you generate your text files to include in the knowledgebase, you might copy text from word documents and other websites. The encoding of these files may differ and introduce “badly encoded characters”. If a file contains any badly encoded characters, the generation of embeddings will fail, and the knowledge base will not refresh.

To avoid this, you should remove any formatting from any content you paste into a text file. You can do this by using the Ctrl + Shift + V shortcut to paste, which will strip out any formatting. Additionally, iconography like bullet points might cause issues, so please replace these with standard dash characters (-).

An additional method to remove any poorly encoded characters would be to copy the text to the text file, and then open a new blank text file and copying the content to this new text file. The act of doing this will strip out any poorly encoded characters.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article