文章/答案/技术大牛

发布

社区首页 >问答首页 >为什么我的Sphinx4认知度差？

问为什么我的Sphinx4认知度差？
EN

Stack Overflow用户

提问于 2015-06-23 18:03:12

回答 1查看 415关注 0票数 1

我正在学习如何使用Maven插件来使用Sphinx4。

我拿了在GitHub上找到的转录样本，并修改它来处理我自己的文件。音频文件是16位，mono，16 The。它大约有13秒长。我注意到这听起来像是在慢动作。

文件中所说的话是，“也要确保你很容易访问录音文件，这样你就可以在被要求的时候上传它”。

我试图转录文件，我的结果是可怕的。我试图找到的论坛帖子或链接，彻底解释如何改善结果，或什么我做的不正确，已经导致我没有在哪里。

我希望加强转录的准确性，但希望避免由于我目前的项目必须处理的数据类型的差异而不得不自己培训一个模型。这是不可能的吗?我正在使用的代码是否关闭？

码

(注:音频文件可在https://instaud.io/8qv获得)

public class App {

public static void main(String[] args) throws Exception {
    System.out.println("Loading models...");

    Configuration configuration = new Configuration();

    // Load model from the jar
    configuration
            .setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

    // You can also load model from folder
    // configuration.setAcousticModelPath("file:en-us");

    configuration
            .setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
    configuration
            .setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
            configuration);
    FileInputStream stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/vocaroo_test_revised.wav"));
   // stream.skip(44); I commented this out due to the short length of my file

    // Simple recognition with generic model
    recognizer.startRecognition(stream);
    SpeechResult result;

    while ((result = recognizer.getResult()) != null) {
        // I added the following print statements to get more information
        System.out.println("\ngetWords() before loop: " + result.getWords());
        System.out.format("Hypothesis: %s\n", result.getHypothesis());
        System.out.print("\nThe getResult(): " + result.getResult() 
                + "\nThe getLattice(): " + result.getLattice()); 

        System.out.println("List of recognized words and their times:");
        for (WordResult r : result.getWords()) {
            System.out.println(r);
        }

        System.out.println("Best 3 hypothesis:");
        for (String s : result.getNbest(3))
            System.out.println(s);

    }
    recognizer.stopRecognition();

    // Live adaptation to speaker with speaker profiles


    stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
   // stream.skip(44); I commented this out due to the short length of my file

    // Stats class is used to collect speaker-specific data
    Stats stats = recognizer.createStats(1);
    recognizer.startRecognition(stream);
    while ((result = recognizer.getResult()) != null) {
        stats.collect(result);
    }
    recognizer.stopRecognition();

    // Transform represents the speech profile
    Transform transform = stats.createTransform();
    recognizer.setTransform(transform);

    // Decode again with updated transform
    stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
   // stream.skip(44); I commented this out due to the short length of my file
    recognizer.startRecognition(stream);
    while ((result = recognizer.getResult()) != null) {
        System.out.format("Hypothesis: %s\n", result.getHypothesis());
    }
    recognizer.stopRecognition();


    System.out.println("...Printing is done..");
}
}

这是输出(我拍摄的相册)：http://imgur.com/a/Ou9oH

eclipse

speech-recognition

cmusphinx

sphinx4

language-model

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-06-24 11:30:41

正如尼古拉所说，音频听起来很奇怪，可能是因为你没有以正确的方式重放它。要将音频从原始的22050 Hz降到所需的16 the，可以运行以下命令：

sox Vocaroo.wav -r 16000 Vocaroo16.wav

沃卡鲁16. you将听起来更好，它将(可能)给你更好的ASR结果。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31010166

复制

相似问题

问为什么我的Sphinx4认知度差？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么我的Sphinx4认知度差？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问为什么我的Sphinx4认知度差？
EN