logo

Tutorial


The easiest way to get started is to use some data from the Projects folder: Give a look at one of the database, use it and get your first result. If you would like to perform your own study with documents from www.nsf.org: go on their website, make your own search, download the documents as .csv, run this scipt to clean your data, upload the output and get your result.

If you would like to do something different and on your own: Congratulations! I have tried to make it as easy as possible, just follow the steps:

    Step 1: Idea

    You need an idea.

    Step 2: Reality

    You need a realistic idea: Can it be described by a number and understood by words?

    Step 3: Action

    You need to get these words and numbers: That's your job. (If you do have realistic idea, and and would like to know where to start -or what we can do for you- please contact us.)

    Step 4: Data

    Your data should be in correct form. It needs to be in a .csv file with only two columns: your numbers and the texts describing your activity.

    First column: Numbers. The first element of the first column is the unit (for instance M$, votes, likes, shares, millions of viewers). Units are important in science and you need one to perform NLQ. Finding your unit might be the first step of the process. Then, you put your numbers: only digit, no characters such as comma, dots for decimals are OK.

    Second column: Texts. The first element of the second column is the description of your texts (no comma in the title please). Then, the text come as it is (all characters welcome here). We do the cleaning for you, remove the html tags, perform the lightest stemming (by removing the "s" at the end of words), create a term document matrix, solve the posterior distribution using a Collapsed Gibbs Sampling, draw the per word topic assignment and per document topic proportion, quantify how much is generated by each topic and write you down a report summary. That's our job.


    Step 5: Run

    If you made it here, once again: Congratulation! Go to the ds4all.io/nlq main page, upload your .csv, run NLQ and get your results!