我正在将我的OCR应用程序从c++转换到java。使用Tess4J,我想得到每个单词的边界框。然而,显然TessResultIterator没有提供任何方法。所以我想知道是否有可能以某种方式获得这些数据?
这是我当前的代码:
TessBaseAPI api = TessAPI1.TessBaseAPICreate();
TessAPI1.TessBaseAPIInit3(api, path, lang);
TessAPI1.TessBaseAPISetPageSegMode(api, TessAPI1.TessPageSegMode.PSM_AUTO);
TessAPI1.TessBaseAPISetImage(api, img, w, h, bpp, bpp*w);
TessAPI1.TessBaseAPIGetUTF8Text(api);
TessResultIterator it = TessAPI1.TessBaseAPIGetIterator(api); 在c++中,我可以像这样继续:
char* text = it->GetUTF8Text(tesseract::RIL_WORD);
int left, top, right, bttm;
it->BoundingBox(tesseract::RIL_WORD, &left, &top, &right, &bttm); 发布于 2013-02-11 07:58:04
你能试试下面的代码片段吗?我并没有真正彻底地测试它。
TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(api);
TessPageIterator pi = TessAPI1.TessResultIteratorGetPageIterator(ri);
String str = TessAPI1.TessResultIteratorGetUTF8Text(ri, TessPageIteratorLevel.RIL_WORD);
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessAPI1.TessPageIteratorBoundingBox(pi, TessPageIteratorLevel.RIL_WORD, leftB, topB, rightB, bottomB);
int left = leftB.get();
int top = topB.get();
int right = rightB.get();
int bottom = bottomB.get();发布于 2016-09-14 19:45:03
下面的方法取自GitHub page for Tess4J,它展示了如何迭代输入文档中每个匹配单词的边界框。让这段代码适应你自己的需求应该很容易,例如,使用你自己的Tesseract数据目录的路径,以及你自己的图像文件的路径。
public void testResultIterator() throws Exception {
File tiff = new File(this.testResourcesDataPath, "eurotext.tif");
BufferedImage image = ImageIO.read(new FileInputStream(tiff));
ByteBuffer buf = ImageIOHelper.convertImageData(image);
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessAPI1.TessBaseAPIInit3(handle, datapath, language);
TessAPI1.TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO);
TessAPI1.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl);
ETEXT_DESC monitor = new ETEXT_DESC();
ITessAPI.TimeVal timeout = new ITessAPI.TimeVal();
timeout.tv_sec = new NativeLong(0L); // time > 0 causes blank ouput
monitor.end_time = timeout;
TessAPI1.TessBaseAPIRecognize(handle, monitor);
TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(handle);
TessPageIterator pi = TessAPI1.TessResultIteratorGetPageIterator(ri);
TessAPI1.TessPageIteratorBegin(pi);
int level = TessPageIteratorLevel.RIL_WORD;
do {
Pointer ptr = TessAPI1.TessResultIteratorGetUTF8Text(ri, level);
String word = ptr.getString(0);
TessAPI1.TessDeleteText(ptr);
float confidence = TessAPI1.TessResultIteratorConfidence(ri, level);
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
TessAPI1.TessPageIteratorBoundingBox(pi, level, leftB, topB, rightB, bottomB);
int left = leftB.get();
int top = topB.get();
int right = rightB.get();
int bottom = bottomB.get();
System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence));
IntBuffer boldB = IntBuffer.allocate(1);
IntBuffer italicB = IntBuffer.allocate(1);
IntBuffer underlinedB = IntBuffer.allocate(1);
IntBuffer monospaceB = IntBuffer.allocate(1);
IntBuffer serifB = IntBuffer.allocate(1);
IntBuffer smallcapsB = IntBuffer.allocate(1);
IntBuffer pointSizeB = IntBuffer.allocate(1);
IntBuffer fontIdB = IntBuffer.allocate(1);
String fontName = TessAPI1.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB,
monospaceB, serifB, smallcapsB, pointSizeB, fontIdB);
boolean bold = boldB.get() == TRUE;
boolean italic = italicB.get() == TRUE;
boolean underlined = underlinedB.get() == TRUE;
boolean monospace = monospaceB.get() == TRUE;
boolean serif = serifB.get() == TRUE;
boolean smallcaps = smallcapsB.get() == TRUE;
int pointSize = pointSizeB.get();
int fontId = fontIdB.get();
System.out.println(String.format(" font: %s, size: %d, font id: %d, bold: %b,"
+ " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b", fontName, pointSize,
fontId, bold, italic, underlined, monospace, serif, smallcaps));
} while (TessAPI1.TessPageIteratorNext(pi, level) == TRUE);
}上面脚本的输出将是一些行,每个匹配的单词占一行,如下所示:
SOME_WORD 65 60 120 83 96.072098字体:空,大小: 32,字体id:-1,粗体:假,斜体:假,带下划线:假,等宽:假,衬线:假,小写:假
匹配词SOME_WORD后面的前四个数字是左、上、右和下坐标。下面是以百分比形式给出的置信度。然后,有一些关于文本本身的元数据,包括字体样式信息。
https://stackoverflow.com/questions/14794551
复制相似问题