At Receipt Bank, we are working hard to make bookkeeping as simple as possible for our users. One of our main challenges is extracting structured data out of payment document. We’re looking for precise answers to questions like:
Given this photo of a receipt, what is the total amount?
Given this PDF invoice, who is the supplier?
We look at many possible technologies for solutions:
OCR can help with extracting the text out of scanned or photographed documents
Computer vision can clean up the noise to make the OCR more accurate
Machine learning can be used to figure out which part is the total amount, which is the VAT and so on
If you feel at home with things like support vector machines, k-means clustering and neural networks, you will certainly enjoy developing solutions alongside us.
We use a Python 3.5 based stack with all of the relevant ecosystem tools like NumPy, Scipy, Scikit-learn, Pandas, OpenCV and others. We’d also like to do more serious testing with TensorFlow, Theano, word2vec and other deep learning technologies.
The data needed for analysis and training is already present – more than 20 million documents have their data correctly extracted and verified (mostly manually). We’re also in the process of building a powerful playground, where you can develop an algorithm, test it quickly in the cloud and get a detailed accuracy report.
As a company, we’re one of the fastest growing companies in Europe, in Top 30 of UK’s startups. We work remotely, but have a nice office with great view in the center of Ljubljana. Our processes are well structured and quick – getting something from prototype in production can be done in a single day.