AI Knowledgebase Encoding
The Talkative Knowledgebase feature can scrape websites for content and allows for files to be uploaded to be utilised by the AI tool. Files which are uploaded should be encoded using the UTF8 encoding.
Encoding Failure Emails
When we refresh a knowledgebase, we generate embeddings from your data for the search tool to use. Embeddings must be valid UTF8 encoded content and will fail if any non-compliant characters are detected.
If you have a UTF8 encoded file and you are receiving an error message indicating there is an encoding error, this will most likely be due to a special character which was pasted into the text document.
When you generate your text files to include in the knowledgebase, you might copy text from word documents and other websites. The encoding of these files may differ and introduce “badly encoded characters”. If a file contains any badly encoded characters, the generation of embeddings will fail, and the knowledge base will not refresh.
To avoid this, you should remove any formatting from any content you paste into a text file. You can do this by using the Ctrl + Shift + V shortcut to paste, which will strip out any formatting. Additionally, iconography like bullet points might cause issues, so please replace these with standard dash characters (-).
An additional method to remove any poorly encoded characters would be to copy the text to the text file, and then open a new blank text file and copying the content to this new text file. The act of doing this will strip out any poorly encoded characters.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article