I will be trying out the Computer Vision API provided as part of Microsoft Cognitive Services.

 

What is Microsoft Cognitive Services?

It’s part of machine learning services provided by Microsoft/ Cognitive Services are provided as API’s for things like facial recognition and speech recognition.

Currently, there are various types of Cognitive Services and to a certain limit, the API is accessible for free.

 

What is Computer Vision API

The “Computer Vision API” which I will be using today, is one of image recognition APIs that allows you to do many different image recognition orientated tasks. The function which I’ll look at is OCR today, but actually there are various other functions within Computer Vision API such as:

Tagging Images

Describing Images

Creating Image Thumbnails

 

Join the Computer Vision API Preview

First I will sign up to the preview for “Computer Vision API”.

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

Click on “Get started for free” **Only working on Internet Explorer

Once you login with a Microsoft Account, it will transfer to the preview site.

 

If you scroll down, you will find “Computer Vision – Preview” so tick the box.

Although the API is currently free, there is a limit of using the API calls up to 5000 per month and up to 20 calls in a minute.

 

If you scroll down further you will find the agreements section. Tick the box for “I agree to the Microsoft Cognitive Services Terms…” and click on “Subscribe”.

 

Now your registration is complete.

You will have two API keys “Key 1” and “Key 2”.

Initially, the keys are displayed as “XXXXXX” so click on “Show”

The usage can be checked by clicking on “Show Quota”

 

Develop with Computer Vision API and try out the OCR

Now that API keys are generated, I will go to Visual Studio and create a simple console program.
Officially, “Visual Studio 2015 Community Edition” or higher is supported.

First create a new project…

 

and select “Console application”

 

Now add two NuGet packages.

 

One is Newtosoft.Json for JSON conversions

 

and the other is Microsoft.ProjectOxford.Vision which includes the Computer Vision API.

 

Now for the coding…

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
using System.IO;

namespace VisionAPI
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("OCR解析するファイルの場所を入力してください:");
            string imageFilePath = Console.ReadLine();

            Uri fileUri = new Uri(imageFilePath);
            var result = DoWork(fileUri, true);
            Console.ReadKey();
        }

        static async Task<OcrResults> UploadAndRecognizeImage(string imageFilePath, string language)
        {

            // Create Project Oxford Vision API Service client
            
            VisionServiceClient VisionServiceClient = new VisionServiceClient("事前に生成されたAPIアクセスキーをここに入力します");

            using (Stream imageFileStream = File.OpenRead(imageFilePath))
            {
                // Upload an image and perform OCR
                OcrResults ocrResult = await VisionServiceClient.RecognizeTextAsync(imageFileStream, language);
                return ocrResult;
            }
        }

        /// Perform the work for this scenario

        static async Task DoWork(Uri imageUri, bool upload)
        {
            Console.WriteLine("OCRを実行中です...");

            string languageCode = "ja";
            var ocrResult = new OcrResults();
            ocrResult = await UploadAndRecognizeImage(imageUri.LocalPath, languageCode);
            Console.WriteLine("OCR解析が完了しました");

            // Log analysis result in the log window
            //
            Console.WriteLine("");
            Console.WriteLine("OCR結果は以下の通りです:");
            LogOcrResults(ocrResult);
        }

        static void LogOcrResults(OcrResults results)
        {
            StringBuilder stringBuilder = new StringBuilder();

            if (results != null && results.Regions != null)
            {
                stringBuilder.Append("内容:");
                stringBuilder.AppendLine();
                foreach (var item in results.Regions)
                {
                    foreach (var line in item.Lines)
                    {
                        foreach (var word in line.Words)
                        {
                            stringBuilder.Append(word.Text);
                            stringBuilder.Append(" ");
                        }

                        stringBuilder.AppendLine();
                    }

                    stringBuilder.AppendLine();
                }
            }

            Console.WriteLine(stringBuilder.ToString());
        }
    }
}

 

and that’s it.

Let’s try it out.

This was the image I used for testing

and result was below.

Although it wasn’t 100% recognized, I think it’s pretty good!

 

Credits to: https://www.microsoft.com/cognitive-services/en-us/documentation