Project Proposal

Here, we briefly describe the project proposal as given by the guidelines.


Title: Parallelized PCYK Algorithm for Syntactic Parsing
URL: https://thepulkitagarwal.github.io/pcyk/
Members: Ansong Ni (ansongn) & Pulkit Agarwal (pulkitag)

Summary

In this project, we aim to parallelize the PCYK parsing algorithm for syntactic parsing on Nvidia GPUs (for CUDA) and Multicore Machines (for OpenMP / Cilk).

Background

The Probabilistic Cocke-Kasami-Younger algorithm (PCYK1) algorithm is a dynamic programming algorithm for parsing sentences with Chomsky Normal Form (CNF).

An illustration of the PCYK algorithm for syntactic parsing for a natural language sentence “Fish people fish tanks” is shown like in the figure below.

PCYK Algorithm Example
Figure 1: An example of the PCYK algorithm for syntactic parsing2
 

It is a bottom-up algorithm that iteratively builds up the binary parse tree from the words as terminal (i.e. leaf) nodes. The appearance binary and unary parse rules are counted during the training time so it could be used to estimate the probability of a parse tree at test time. After performing the PCYK parsing and obtaining the parsing chart, the best parse tree is generated through a greedy search of the most probable child from the root node.

Different parts of this algorithm can benefit from parallelism, which we will describe with more detail in the challenges section.

The Challenges

Goals

Plan to Achieve

For the CUDA Implementation of the PCYK Algorithm, the paper Efficient Parallel CKY Parsing on GPUs by Yi et. al. achieves 26x speedup. We aim to achieve a similar speedup on the GHC machines.

Hope to Achieve

Deliverables

Poster Session

Platform Choice

We plan to use CUDA (GPU Architecture) and OpenMP / Cilk (Shared Memory Multicore Architecture).

Resources

Hardware Resources

The main resource we would require are Nvidia GPUs (for CUDA) and Multicore Machines (for OpenMP / Cilk). We believe that the GHC machines we used for our assignments should be good enough for our project.

Planned Schedule

# Dates Plan
1 Oct 27 - Nov 02 Meeting with Course Instructors, Project Proposal
2 Nov 03 - Nov 09 Sequential Version of PCYK
3 Nov 10 - Nov 16 Cuda Version of PCYK
4 Nov 17 - Nov 23 Optimizations in the Cuda Version of PCYK
5 Nov 24 - Nov 30 Optimizations in the Cuda Version of PCYK
6 Dec 01 - Dec 07 Hope to Achieve Goals
7 Dec 08 - Dec 10 Final Report + Poster Session

  1. The PCYK algorithm is also known as the CYK algorithm or the CKY algorithm in the NLP Community. 

  2. Example borrowed from the slides of the CMU Algorithms for NLP course