• Get the logical structure of an academic article without using libraries

    ShikhaTan Member

    Scientific article PDFs have a common structure say Paper title, Abstract, Introduction, Proposed scheme, Experiments, Conclusion, References and so on. There will be sections and subsections for each.

    How can I accept a PDF document as input and extract its physical structure. I want to get the physical structure of the PDF along with the data.

    Eg: if the pdf document is like
    Article Title
    Abstract
    sentences

    Introduction
    section-1
    subsection-1
    subsection-2
    Proposed scheme
    section-1
    subsection-1

    and so on….

    I am coding in C#.

    I want the logical structure of the document. Please remember that all scientific articles have a similar logical structure.

    I have heard of some heuristic measures, but due to time constraint, I am looking for codes. Can anyone please help?

    I am not allowed to use any publicly available libraries such as parsCit and so on. This is for my project work.

  • Amit Member

    One thing to note is that all of these articles are actually written using Latex, with a specific template, e.g. have a look at this fir IEEE articles:

    https://www.sharelatex.com/templates/journals/ieee-journal

Viewing 1 reply thread
  • You must be logged in to reply to this topic.