BizBook Tech Talk Pt 1 : Extract Text From PDF using C# and iTextSharp

One of my clients who is currently using BizBook , previously used WaveApps for some months. This post is about to show you how did i extracted text from individual invoices of the pdf files and saved them in my database.

They were a happy organization while using the Waveapps. But, once they grew bigger, wave stopped supporting to download the invoices as a bulk excel or csv. So, when they decided to jump into BizBook , then they had to export their all existing data into BizBook system. But unfortunately, the wave didn’t allow csv / excel downloading anymore, we tried to reach to them via Facebook, email and tried our best to collect the phone number. We couldn’t react out to them, so we had no choice but to manually enter the the data into BizBook.

After analyzing couple of invoices, i figure out it has a pattern. Check the image below.

BizBook tech talk Wave Invoice

 

So, I decided to extract the text from the pdf and insert them using my own API through a small exporting app. Below is the heart of the small tool.

 

Then i wrote a function to cleanup the data and save to txt files.

 private static void ExportPdfsToTxt(string folder)
        {
            string txts = $@"{folder}\txts\";
            if (!Directory.Exists(txts))
            {
                var directory = Directory.CreateDirectory(txts);
            }

            var files1 = Directory.GetFiles(folder);
            foreach (var file in files1)
            {           
                FileInfo f = new FileInfo(file);
                IEnumerable textFromPdf = GetTextFromPDF(file);
                string r1 = $@"";
                string e = "";
                foreach (string v in textFromPdf.ToList())
                {
                    string replace = v.Replace(r1, string.Empty);
                    e += replace + "\n";
                }

                File.WriteAllText(txts + $@"{f.Name}-{DateTime.Now.Ticks}.txt", e);
            }
        }

 

And then i read the text files and fill them in my model’s properties.

BizBook tech talk pdf

 

And then, at last, i sent them to my own system. In my server i have tons of business logics which will then handle the stock and accounting calculations.

            HttpClient client = CreateHttpClient();

            foreach (Sale sale in sales)
            {

                bool exists = shortSales.Any(x => x.OrderNumber == sale.OrderNumber);
                if (exists)
                {
                    continue;
                }

                Console.WriteLine($"Sending {sale.OrderNumber} to server.");
                var url = ConfigurationManager.AppSettings["base-inventory-url"]
                          + ConfigurationManager.AppSettings["sale-add-url"];
                var postAsync = client.PostAsync(
                    url,
                    new StringContent(JsonConvert.SerializeObject(sale), Encoding.UTF8, "application/json")).Result;
                HttpStatusCode code = postAsync.StatusCode;
                if (code != HttpStatusCode.OK)
                {
                    string result = postAsync.Content.ReadAsStringAsync().Result;
                    Console.WriteLine("Result : " + result);
                }
                else
                {
                    Console.WriteLine("Result code : " + code);
                }
            }


So, if you already using Wave and want to extract the data for any need, feel free to use the above code block. You can contact with me for a professional support as well. 
Thanks. 

You May Also Like

About the Author: Foyzul Karim

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: