QnA Not working as expected

Hie folks.

Although QnA doesn’t seem to work in a bot where there are other flows, my workaround has always been to create a bot specifically handling QnA and then use interbot communication to connect to my QnA bot.

However, I tried to do that this morning to no avail. The QnA has 229 (289kb) unique entries, all with the recommended 10+ sample questions. The questions consists an amalgamation of the efforts of five different data capture folks who inputted manually on different bots, exported and sent me the jsons. I then imported to this bot.

On training the model (claims to have) trained successfully. However, when I type in anything verbatim from sample question I get this and the default message which I set on event.nlu.intent.name === 'none'

image

At the bottom, I get this red gear icon which doesn’t restart the server as it promises

when I refresh browser I get a red NLU performance icon (the box) If I retrain my score is 17% (I don’t know if that’s good or bad)

My cli looks like this
See below from @Pierre Paul

Please help

Just reformatting the error log to make it better :

could not train nlu model URIError: URI malformed
at encodeURI ()
at getCacheKey (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:407:41)
at tokens.forEach (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:412:41)
at Array.forEach ()
at RemoteLanguageProvider.vectorize (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:409:12)
at RemoteLanguageProvider.generateSimilarJunkWords (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:367:18)
at Object.generateSimilarJunkWords (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\index.js:84:61)
at AppendNoneIntents (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\engine2\engine2.js:832:33)
at Trainer (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\engine2\engine2.js:602:20)
11:51:02.824 Mod[nlu] Retraining model for bot budqna…
11:51:05.143 Mod[nlu] Error training NLU model [URIError, URI malformed]
STACK TRACE
URIError: URI malformed
at encodeURI ()
at getCacheKey (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:407:41)
at tokens.forEach (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:412:41)
at Array.forEach ()
at RemoteLanguageProvider.vectorize (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:409:12)
at RemoteLanguageProvider.generateSimilarJunkWords (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:367:18)
at SVMClassifier.train (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\pipelines\intents\svm_classifier.js:160:53)
11:51:05.926 Mod[nlu] Error training NLU model [URIError, URI malformed]
STACK TRACE
URIError: URI malformed
at encodeURI ()
at getCacheKey (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:407:41)
at tokens.forEach (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:412:41)
at Array.forEach ()
at RemoteLanguageProvider.vectorize (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:409:12)
at RemoteLanguageProvider.generateSimilarJunkWords (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\language-provider.js:367:18)
at SVMClassifier.train (C:\Projects\Debug16\BudQNA\modules.cache\module__baa284026cdaf42fcc6ce79f3648b2b5cc81694e29c4fddbfef853846efc8bde\dist\backend\pipelines\intents\svm_classifier.js:160:53)
12:00:42.799 Mod[nlu] === Confusion Matrix ===
12:00:42.799 Mod[nlu] F1: 0.17836183697100366 P1: 0.38482199117832383 R1: 0.13231698253437396
12:00:45.039 Mod[nlu] Model training completed for bot budqna

Thanks mate I always trip myself in markdown!

Seems like language provider is trying to hit an invalid language server endpoint. Do you happen to host your own language server or did you added the nlu configurations ? Can you make sure you have a valid languageSources array in your nlu config.

{
  /**
   * The list of sources to load languages from
   * @default [{ "endpoint": "https://lang-01.botpress.io" }]
  */
  languageSources: LanguageSource[]
}

I am using a self-hosted instance of both duckling and the language server. I downloaded the lang server in bp11.x (if that makes a difference). Only thing I changed was the autoTrainInterval which I set to 55s from 30s.

I am not getting this error on smaller datasets though.

Seems like your larger datasets contains unsupported characters (raging from U+D800 to U+DFFF). I’ll come up with a fix for this meanwhile, you might want to look up for those characters

Okay will wait for the rewrite then. No need to invent two wheels.