Photo by Finn Mund on Unsplash

Tesseract Optical Character Recognition (OCR) engine by Google is arguably the most popular out-of-the-box solution for OCR. Recently, I was tasked to build an OCR tool for documents. I am aware of its robustness, however, out of curiosity, I wanted to investigate its performance on documents, specifically.

As always, the starting point was sourcing for a reliable ground truth before thinking about synthesising one of my own. Luckily, I found one: DDI-100 dataset by the Machine Intelligence Team from Moscow Institute of Physics and Technology. It has about 30GB of data with character-level ground truth data, which is sweet! However…

Photo by M. B. M. on Unsplash

I recently embarked on a task of building a prediction model that forecasts the movement of the next day stock prices in the Australian Securities Exchange (ASX). I am writing this piece to share my journey of discovering a more reliable strategy based on a long short-term memory (LSTM) model built upon approximately 2.5 years of end of day (EOD) financial market data from the ASX. The Jupyter notebook can be downloaded from my repository here.

1. Data Exporting

The first task is converting the EOD data into five seperate time series data frames; one each for open, high, low, close and volume…

Arvind Rajan

Data Scientist at DNS Technology | Former AI Engineer at Brookfield Asset Management.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store