7 Textual Images Contents The idea of textual

Contents • • The idea of textual image compression Lossy and lossless compression Extracting

The idea of textual image compression(1) • Textual image compression : building a library

The idea of textual image compression(2) – Identify the symbol in the library, and

The idea of textual image compression(3) – Store enough additional information • reconstructed image에서

Lossy and lossless compression(1) • Early system은 symbol sequence와 offset을 압축 하지 않고 저장

Lossy and lossless compression(2) • lossy textual image compression은 특정 symbol 로 match되는 모든

Lossy and lossless compression(3) • lossy compressio에서는 symbol들이 library에서 생략되지 않을 때 lossless compression에서는

Extracting marks(1) • Tracing the boundary of a mark – Boundary-tracing algorithm은 그룹을 이루고

Extracting marks(2) • Removing the mark from the image – flood-fill method • seed

Template matching(1) • Global template-matching – 새 symbol과 library template사이의 overall mismatch를 측정 –

Template matching(2) • Local template-matching – error map의 개개의 area들을 고려 – reject a

Symbols and their offsets • Symbols는 유일한 identification number로 assign 됨 • 하나의 symbol에서

Coding the components of a textual image • 분해된 이미지의 각 component는 하나의 bit-stream

Slides: 16

Download presentation

7. Textual Images 인공지능연구실

Contents • • The idea of textual image compression Lossy and lossless compression Extracting marks Template matching From marks to symbols Coding the components of a textual image Performance : Lossy and lossless modes 2

The idea of textual image compression(1) • Textual image compression : building a library of shapes + replace with pointers into the library – Find, isolate, and extract all the marks in the image • i와 j와 같은 분리된 글자는 library에 한 쌍으로 저장됨 • 매우 작은 점과 큰 mark를 제외시키기 위한 size threshold – Construct a library • Figure 7. 2 참조 • 매우 비슷한 match가 발견되지 않으면 새 symbol로서 추가 • character가 아닌 bitmap으로 image에서 추출됨 3

The idea of textual image compression(2) – Identify the symbol in the library, and measure the coordinate offset • Table 7. 1 참조 • approximate symbol + symbol number + intersymbol gaps – Compress and store the library, the symbol sequence, and the offsets • Figure 7. 4 참조 • original image와 다른점 – 매우 작거나 큰 pixel들의 그룹은 제외됨 – original image에서는 비슷했던 character들이 똑같은 symbol 로 표기됨 – 자주 나타나지 않는 mark들은 library에서 없어짐 4

The idea of textual image compression(3) – Store enough additional information • reconstructed image에서 완전한 original image를 만들기 위한 충분한 정보를 저장 • residue bitmap을 만들기 위해서 reconstructed image와 original image를 exclusive-OR • residue를 충분히 압축하는 것은 어렵기 때문에 reconstructed text를 이용하는 것이 효과적 (Section 7. 6) 5

Lossy and lossless compression(1) • Early system은 symbol sequence와 offset을 압축 하지 않고 저장 → 16 : 1 • library element와 한번 match된 symbol은 제거 하고 conventional image compression method로 압축할 수 있는 residue로 저장 • image에서 추출된 symbol들 사이의 offset의 coding 방법을 개선하고, library의 pattern들도 압축 → 25 : 1 • 압축은 common font를 library로 load하고 residue를 사용하지 않으면 향상됨 6

Lossy and lossless compression(2) • lossy textual image compression은 특정 symbol 로 match되는 모든 mark들의 bitmap들의 평균 을 template로 이용하면 performance가 개선됨 • 손실없이 압축을 하려면 Step 5에서 speck과 halos를 포함한 residue 전체를 encoding해서 original image를 완벽히 reproduce할 수 있어야 함 • step 5에서 lossless mode로 encoding하는 데 3/4 을 차지하므로 noncritical application에서는 lossy encoding을 사용하는 것이 효과적 7

Lossy and lossless compression(3) • lossy compressio에서는 symbol들이 library에서 생략되지 않을 때 lossless compression에서는 적당한 library management policy가 있을때 압축률이 극 대화됨 8

Extracting marks(1) • Tracing the boundary of a mark – Boundary-tracing algorithm은 그룹을 이루고 있는 pixel들의 연결성에 따름 – 8 -connectivity 이용 – current, previous, next pixel 이용(Figure 7. 5 참조) – Fast procedure for boundary tracing(Figure 7. 6 참조) – operation 중에 minimum과 maximum x, y값을 유지하 고 있다가 image의 경계 밖에 있는 pixel들은 흰색으 로 간주됨 9

Extracting marks(2) • Removing the mark from the image – flood-fill method • seed pixel부터 시작해서 그것을 target array에 저 장하고 image에서 제거한 다음 그 pixel의 8 neighbor에 대해서 recursive하게 flood-filling procedure를 적용 – run-based region fill algorithm • Figure 7. 7 참조 10

Template matching(1) • Global template-matching – 새 symbol과 library template사이의 overall mismatch를 측정 – cluster내에 error pixel이 있을 때는 weight를 좀 더 많이 줌(두 e : c와 o) – weighted error가 threshold를 넘으면 reject됨 – Figure 7. 8, 7. 9 참조 11

Template matching(2) • Local template-matching – error map의 개개의 area들을 고려 – reject a match if • a black pixel in the error map has two of more black neighbors, at least two of which are not connected to each other • the corresponding pixel in either one of the bitmaps is entirely surrounded by either white or black pixels. – 종종 작은 character에 대해서 false match – local method는 symbol들의 크기에 민감하므로 적당 한 수의 pixel들로 이루어진 local pattern을 검사해야 함 12

Symbols and their offsets • Symbols는 유일한 identification number로 assign 됨 • 하나의 symbol에서 x, y offset은 한 mark의 lower-right corner로부터 다음 mark의 lower-left corner까지 측정되어진다. • Whitespace의 처리 – 특별한 코드를 삽입 – 미리 encoder와 decoder는 이 특별한 코드를 알고 있 어야 함 – Space, tab, newline의 구분이 필요 14

Coding the components of a textual image • 분해된 이미지의 각 component는 하나의 bit-stream 으로 encode된다. • Library – 먼저 library내의 symbol의 총수가 전송 – 각 bit-stream을 전송(먼저 높이와 넓이를 encode) – Bitmap content는 two-level method를 이용하여 encode • Symbol numbers – PPM technique을 이용 • Symbol offsets – adaptive coding을 이용 • Original image – 원 이미지와 재구성된 이미지의 차이는 residue를 통해 알 수 있다 – 좋은 압축법이 적용될수록 residue는 작아짐 15