June 3, 2022

We Built one of the Top OCR Tools

Brandon Kunkel

Code is everywhere you look these days - it’s written, copied, pasted and shared on websites, in videos and articles, in messengers like Slack or Discord, and, of course, in your tools. In the most frustrating cases, the code that you might want to rip and re-use in your project is buried inside an image or a YouTube video, requiring you to type each character out manually. We’re happy to eliminate that annoyance forever with the launch of CodeFromScreenshot.com— one of the top OCR tools!

First released inside our flagship AI-powered code snippet tool, Pieces for Developers, (which is incredibly awesome and gives you superhuman speed when saving, re-using, and sharing snippets), we decided to let Pieces’s code extraction feature shine on its own with its very own web-based utility site.

Codefromscreenshot.com provides a quick way to use code from screenshots. It can even identify the language of the extracted code! We designed this site to be fast, easy-to-use and ad-free. This is the culmination of 6 months of development work on our very own Runtime.dev web APIs and our machine learning team’s intense work to build OCR models that are fine-tuned to interpret technical language (i.e., code) rather than natural language.

You may be wondering:

Why are people saving or sending screenshots of code in the first place?
Can’t I just embed code?
Why not use other OCR tools?

Why are people saving or sending screenshots of code in the first place?

The Pieces dev team constantly shares screenshots of code and error traces. We prefer code screenshots in many cases because they preserve text formatting and hence communicate more than just the text. Look at these two error messages in Slack, and tell me which one communicates why my C++ code won’t compile :)

Copied and pasted code without formatting or syntax highlighting

A screenshot of code with proper formatting and syntax highlighting

Can’t I just embed code?

Developer-focused content creators for publications like Medium may prefer to spend more time developing code and less time formatting it. Medium’s code embedding features are lacking— to get full syntax highlighting, you need to use an external code embedding service. Some content creators opt for screenshots of their work instead, but readers want to copy code and use it without having to reproduce it character-by-character.

Check out the hoops some people jump through to embed and format code on Medium in this article.

Why did we build our own OCR models rather than use existing OCR tools?

Most available OCR models were developed for plain text extraction from images. We found the models underserved developers whose collections of images and screenshots of code tend to have code-specific punctuation. Common OCR models, which are designed for plain text, miss this punctuation. We painstakingly designed our OCR models for code extraction.

Try out one of the top OCR tools for yourself at codefromscreenshot.com. Better yet, download Pieces for Developers and take advantage of the same feature and a lot more!

P.S. We want to shout out some of the other web utilities we took inspiration from, including:

https://tinypng.com/ - minifies PNGs to enable your web content to load faster
https://www.minifier.org/ - minifies JS and CSS
https://www.diffchecker.com/ - check for differences between two samples of text
https://regex-generator.olafneumann.org/ - generate a regular expression from sample text

Lead photo from FreeCodeCamp's Dart Programming Tutorial on Youtube

Table of Contents

No items found.