Location: ENR2 S230Presenter: Dr. Marek Rychlik, UA Mathematics
Title: OCR, Soft K-Means, CLSTM and RAID The purpose of this talk is to share several problems and research directions with the TRIPODS groups at large, in various phases of progress. We hope to find other researchers involved in similar research, and identify potential for collaborations. OCR (optical character recognition): The current thrust of this project is to develop algorithms for annotating a large collection of scanned text in the Pashto language (spoken in Afganistan and Pakistan). This project is a collaboration with Yan Han at Library Sciences. The best algorithm is based CLSTM (Context Long-Short Term Memory), an algorithm which descended from LSTM, used, amongst others, in Amazon's Alexa. Soft K-Means and separation of mixtures: The problem of populations which are "mixtures" of subpopulations goes back to the works of Pearson. Soft K-Means is an algorithm closely related to the EM (Expectation-Minimization) algorithm. Soft K-Means is more suitable for many problems than the known K-means algorithm. RAID (Redundant Arrays of Independent disks) is a method of combining multiple disk drives into one device with better throughput and error correcting capabilities. This part of my talk is mostly on our experience commercializing RAID invented by Mohamad Moussa and Marek Rychlik.