SentText was created by Johanna Dangel (johanna.dangel@stud.uni-regensburg.de) as part of her thesis in media informatics at the University of Regensburg. Find out more about her and her work on her GitHub Page.

The thesis was supervised and supported by Thomas Schmidt (thomas.schmidt@ur.de) and will also be maintained by him. More information about Thomas Schmidt can be found here.

Citation Information

If you use SentText in any shape or form for research please don't forget to cite the following paper. You can also find more information about the development process in this paper:

Schmidt, T., Dangel, J. & Wolff, C. (2021). SentText: A Tool for Lexicon-based Sentiment Analysis in Digital Humanities. In: Schmidt, T. & Wolff, C. (Eds.), Information between Data and Knowledge. Information Science and its Neighbors from Data Science to Digital Humanities. Proceedings of the 16th International Symposium of Information Science (ISI 2021). Glückstadt: Verlag Werner Hülsbusch. (pp. 156—172). DOI: 10.5283/epub.44943 [pdf] [bibtex]

The above paper was presented at the 16th International Symposium of Information Science (ISI 2021). Find more information about the conference here. You can access the entire proceedings here.

The presentation that was given about SentText at ISI 2021 can be found on Youtube.

Supported languages

Currently only analysis of German language texts is supported.


File import and export

You have the possibility to analyze both .xml and .txt files. To upload more then one file, press Ctrl/Strg during selection.

To export the results, it is possible to download them as a .csv file and as .xml file (in a specific format). To export the created charts you have the possibility to download them as .png files. In general, please note that decimal numbers are separated with a '.'.


Lemmatization

Lemmatization deals with the regression of a word to its basic form. Thus the basic form "love" is determined from the verb "loving". Studies show that the lemmatization of words can improve the result of the analysis, especially in literary texts (Schmidt & Burghardt, 2018).

We use the lemmatization of textblob for German lemmatization. We currentyl do not offer any lemmatization support for other languages.

If a lemmatization is desired, the current inflectional form of a word is first searched for in the dictionary. If the word is not present in its current form, the lemma of the word is looked up. It should be noted that the correct lemma forms are not always found and there is a risk that verbs will be nominalized ("laufen" (das) "Laufen"). For this reason, a case-insensitive analysis may be recommended.


Lemma attribute in .xml files

It may happen that the XML file already contains information regarding the lemma. In this case it is possible to specify the name of the attribute in which the lemma is stored in the upload under "advanced user options". An example of such a file can be found under Deutsches Textarchiv and downloaded as XML (TEI P5 incl. att.linguistic).

Workflow of lemmatization, if this field is activated: Determine if the word has the specified lemma attribute and if so, this lemma is used. If the word does not have the lemma attribute, it is checked if a general lemmatization is desired.


Case sensitivity of the Sentiment Analysis

The application offers both a case sensitive and a case insensitive analysis. In the case insensitive analysis, both the words to be examined and the lexicon are converted to lower case before they are checked for consistency. This process can be manipulated in the advanced user settings. It should be noted, however, that when lemmatizing German verbs, they can become nouns. Therefore, it is recommended to perform a case insensitive word matching when using a lemmatization.


Stop words

By stop words one understands words, which have little semantic meaning and are particularly frequent. In order to improve the performance (speed) of the analysis, these are not checked for sentiment value in the sentiment analysis (if desired). Both the basic forms of a stop word and its inflections are taken into account. The verification of a word as a stopword is always case-insensitive.

Furthermore, you have the possibility to add your own stop words to the list.

List of german stopwords (Bird, Loper & Klein, 2009; customized)

aber, alle, allem, allen, aller, alles, als, also, am, an, ander, andere, anderem, anderen, anderer, anderes, anderm, andern, anderst, anders, auch, auf, aus, bei, bin, bis, bist, da, damit, dann, der, den, des, dem, die, das, dass, daß, derselbe, derselben, denselben, desselben, demselben, dieselbe, dieselben, dasselbe, dazu, dein, deine, deinem, deinen, deiner, deines, denn, derer, dessen, dich, dir, du, dies, diese, diesem, diesen, dieser, dieses, doch, dort, durch, ein, eine, einem, einen, einer, eines, einig, einige, einigem, einigen, einiger, einiges, einmal, er, ihn, ihm, es, etwas, euer, eure, eurem, euren, eurer, eures, für, gegen, gewesen, hab, habe, haben, hat, hatte, hatten, hier, hin, hinter, ich, mich, mir, ihr, ihre, ihrem, ihren, ihrer, ihres, euch, im, in, indem, ins, ist, jede, jedem, jeden, jeder, jedes, jene, jenem, jenen, jener, jenes, jetzt, kann, können, könnte, machen, man, manche, manchem, manchen, mancher, manches, mein, meine, meinem, meinen, meiner, meines, mit, muss, musste, nach, noch, nun, nur, ob, oder, sehr, sein, seine, seinem, seinen, seiner, seines, selbst, sich, sie, ihnen, sind, so, solche, solchem, solchen, solcher, solches, soll, sollte, sondern, sonst, über, um, und, uns, unsere, unserem, unseren, unser, unseres, unter, viel, vom, von, vor, während, war, waren, warst, was, weg, weil, weiter, welche, welchem, welchen, welcher, welches, wenn, werde, werden, wie, wieder, will, wir, wird, wirst, wo, wollen, wollte, würde, würden, zu, zum, zur, zwar, zwischen


Negations

Negations of a word are recognized in the direct environment of the sentiment bearing word. The distance between sentiment bearing word and negation must not be greater than four tokens, whereby there must be no punctuation between sentiment bearing word and negation. If a word is negated, then the sentiment of the word is shifted to the other polarity

Scope of Negation Detection

Tom NOT LOVE Julia: "love" is negative
Tom NOT very love Julia: "love" is negative
Tom not, love Julia: "love" is positive, because between "not" and "love" is a punctuation.

List of german negations (Weinrich, 2007)

kein, keine, keinem, keinen, keiner, keines, keins, nicht, nichts, nein, nichts, nie, niemals, niemand, nirgends, nirgendwo, nirgendwoher, nirgendwohin, keinesfalls, keineswegs, mitnichten, ohne, weder noch

Based on the morphological structure of the word

For example 'unschön' (prefix) and 'herzlos' (suffix)

List of german negating prefixes (Weinrich, 2007)

nicht-, un-, miß-, fehl-, schein-, in-, il-, im-, ir-, a-, an-, dis-, des-, beinahe-, möchtegern-, pseudo-, quasi-, non-

List of german negating suffixes (Weinrich, 2007)

-frei, -leer, -los, -losigkeit, -arm

Punctuations for the sentiment analysis (Bird, Loper & Klein, 2009; customized)

' ! " # & '' ‚ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ -- — „ “ `` ... « »

Words that are treated as punctuation

German: aber, weil, und


Use of a dictionary

If you want to use your own Lexicon, create a CSV file with the sentiment-bearing in the first column. Via own lexicons it is in general possible to use SentText also with languages other than German. To the right column are the corresponding polarity strengths. A ';' is used as separator. The Lexicon should not have a header.

Example:
decrease;-0.0048
love;0.4902
destructive;-0.0048

In order to ensure flawless functionality, it is important that a positive sentiment is represented by positive numbers.
Negative sentiments are identified by negative numbers (starting with a '-').
Decimal places can be identified by a '.' or by a ','.


What data will be saved?

The application does not save any data! However, this leads to the fact that as soon as you leave or reload the results page, all changes made are lost! So please pay attention and save everything you want to save!


Contact

If you have questions or comments about application, or a bug has occurred, please contact us at thomas.schmidt@ur.de.


Lexicons / Dictionaries

We offer two external German sentiment lexicons / dictionaries per default. For more information please refer to the following great papers.

Remus, R., Quasthoff, U., & Heyer, G. (2010). SentiWS-A Publicly Available German-language Resource for Sentiment Analysis. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 1168–1171. Valletta, Malta: European Language Resources Association

Võ, M. L.-H., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M.J., & Jacobs, A.M. (2009). The Berlin Affective Word List Reloaded (BAWL-R). Behavior Research Methods, 41(2), 534-539.

Resources

Bird, S., Loper, E. & Klein, E. (2009), Natural Language Processing with Python. O’Reilly Media Inc.

Weinrich, H. (2007). Textgrammatik der deutschen Sprache (4th ed.). Hildesheim: Georg Olms.