SentText was created by Johanna Dangel (johanna.dangel@stud.uni-regensburg.de) as part of her thesis in media informatics at the University of Regensburg. Find out more about her and her work on her GitHub Page.
The thesis was supervised and supported by Thomas Schmidt (thomas.schmidt@ur.de) and will also be maintained by him. More information about Thomas Schmidt can be found here.
Citation Information
The above paper was presented at the 16th International Symposium of Information Science (ISI 2021). Find more information about the conference here. You can access the entire proceedings here.
The presentation that was given about SentText at ISI 2021 can be found on Youtube.
Supported languages
Currently only analysis of German language texts is supported.
File import and export
You have the possibility to analyze both .xml and .txt files. To upload more then one file, press Ctrl/Strg during selection.
To export the results, it is
possible to download them as a .csv file and as .xml file (in a specific format). To export the created charts you have the
possibility to download them as .png files. In general, please note that decimal numbers are
separated with a '.'.
Lemmatization
Lemmatization deals with the regression of a word to its basic form. Thus the basic form
"love" is determined from the verb "loving".
Studies show that the lemmatization of words can improve the result of the analysis, especially in literary texts (Schmidt & Burghardt, 2018).
We use the lemmatization of textblob for German lemmatization.
We currentyl do not offer any lemmatization support for other languages.
If a lemmatization is desired, the current inflectional form of a word is first searched for in the dictionary. If the word is not
present in its current form, the lemma of the word is looked up. It should
be noted that the correct lemma forms are not always found and there is a risk that verbs
will be nominalized ("laufen" (das) "Laufen"). For this reason, a case-insensitive
analysis may be recommended.
Lemma attribute in .xml files
It may happen that the XML file already contains information regarding the lemma. In this case it is possible to specify the name of the attribute in which the lemma is stored in the upload under "advanced user options". An example of such a file can be found under Deutsches Textarchiv and downloaded as XML (TEI P5 incl. att.linguistic).
Workflow of lemmatization, if this field is activated: Determine if the word has the specified lemma attribute and if so, this lemma is used. If the word does not have the lemma attribute, it is checked if a general lemmatization is desired.
Case sensitivity of the Sentiment Analysis
The application offers both a case sensitive and a case insensitive analysis. In the case insensitive analysis, both the words to be examined and the lexicon are converted to lower case before they are checked for consistency. This process can be manipulated in the advanced user settings. It should be noted, however, that when lemmatizing German verbs, they can become nouns. Therefore, it is recommended to perform a case insensitive word matching when using a lemmatization.
Stop words
By stop words one understands words, which have little semantic meaning and are particularly
frequent. In order to improve the performance (speed) of the analysis, these are not checked
for sentiment value in the sentiment analysis (if desired). Both the basic forms of a
stop word and its inflections are taken into account. The verification of a word as a
stopword
is always case-insensitive.
Furthermore, you have the possibility to add your own stop words to the list.
List of german stopwords (Bird, Loper & Klein, 2009; customized)
aber, alle, allem, allen, aller, alles, als, also, am, an, ander, andere, anderem, anderen, anderer, anderes, anderm, andern, anderst, anders, auch, auf, aus, bei, bin, bis, bist, da, damit, dann, der, den, des, dem, die, das, dass, daß, derselbe, derselben, denselben, desselben, demselben, dieselbe, dieselben, dasselbe, dazu, dein, deine, deinem, deinen, deiner, deines, denn, derer, dessen, dich, dir, du, dies, diese, diesem, diesen, dieser, dieses, doch, dort, durch, ein, eine, einem, einen, einer, eines, einig, einige, einigem, einigen, einiger, einiges, einmal, er, ihn, ihm, es, etwas, euer, eure, eurem, euren, eurer, eures, für, gegen, gewesen, hab, habe, haben, hat, hatte, hatten, hier, hin, hinter, ich, mich, mir, ihr, ihre, ihrem, ihren, ihrer, ihres, euch, im, in, indem, ins, ist, jede, jedem, jeden, jeder, jedes, jene, jenem, jenen, jener, jenes, jetzt, kann, können, könnte, machen, man, manche, manchem, manchen, mancher, manches, mein, meine, meinem, meinen, meiner, meines, mit, muss, musste, nach, noch, nun, nur, ob, oder, sehr, sein, seine, seinem, seinen, seiner, seines, selbst, sich, sie, ihnen, sind, so, solche, solchem, solchen, solcher, solches, soll, sollte, sondern, sonst, über, um, und, uns, unsere, unserem, unseren, unser, unseres, unter, viel, vom, von, vor, während, war, waren, warst, was, weg, weil, weiter, welche, welchem, welchen, welcher, welches, wenn, werde, werden, wie, wieder, will, wir, wird, wirst, wo, wollen, wollte, würde, würden, zu, zum, zur, zwar, zwischen
Negations
Negations of a word are recognized in the direct environment of the sentiment bearing word. The distance between sentiment bearing word and negation must not be greater than four tokens, whereby there must be no punctuation between sentiment bearing word and negation. If a word is negated, then the sentiment of the word is shifted to the other polarity
Scope of Negation Detection
Tom NOT LOVE Julia: "love" is negative
Tom NOT very love Julia: "love" is negative
Tom not, love Julia: "love" is positive, because between
"not" and "love" is a punctuation.
List of german negations (Weinrich, 2007)
kein, keine, keinem, keinen, keiner, keines, keins, nicht, nichts, nein, nichts, nie, niemals, niemand, nirgends, nirgendwo, nirgendwoher, nirgendwohin, keinesfalls, keineswegs, mitnichten, ohne, weder noch
Based on the morphological structure of the word
For example 'unschön' (prefix) and 'herzlos' (suffix)
List of german negating prefixes (Weinrich, 2007)
nicht-, un-, miß-, fehl-, schein-, in-, il-, im-, ir-, a-, an-, dis-, des-, beinahe-, möchtegern-, pseudo-, quasi-, non-
List of german negating suffixes (Weinrich, 2007)
-frei, -leer, -los, -losigkeit, -arm
Punctuations for the sentiment analysis (Bird, Loper & Klein, 2009; customized)
' ! " # & '' ‚ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ -- — „ “ `` ... « »
Words that are treated as punctuation
German: aber, weil, und
Use of a dictionary
If you want to use your own Lexicon, create a CSV file with the sentiment-bearing in the
first column. Via own lexicons it is in general possible to use SentText also with languages other than German.
To the right column are the corresponding polarity strengths. A ';' is used as
separator.
The Lexicon should not have a header.
Example:
decrease;-0.0048
love;0.4902
destructive;-0.0048
In order to ensure flawless functionality,
it is important that a positive sentiment is represented by positive numbers.
Negative sentiments are identified by negative numbers (starting with a '-').
Decimal places can be identified by a '.' or by a ','.
What data will be saved?
The application does not save any data! However, this leads to the fact that as soon as you leave or reload the results page, all changes made are lost! So please pay attention and save everything you want to save!
Contact
If you have questions or comments about application, or a bug has occurred, please contact us at thomas.schmidt@ur.de.
Lexicons / Dictionaries
We offer two external German sentiment lexicons / dictionaries per default. For more information please refer to the following great papers.Remus, R., Quasthoff, U., & Heyer, G. (2010). SentiWS-A Publicly Available German-language Resource for Sentiment Analysis. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 1168–1171. Valletta, Malta: European Language Resources Association
Võ, M. L.-H., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M.J., & Jacobs, A.M. (2009). The Berlin Affective Word List Reloaded (BAWL-R). Behavior Research Methods, 41(2), 534-539.
Further Reading and Tools
We did some research on lexicon based sentiment analysis on literary texts by G. E. Lessing. Thus, if you want to know more about this method , please take a look at the following papers. We also developed a small visualization tool for the results of Lessing's plays.
Resources
Bird, S., Loper, E. & Klein, E. (2009), Natural Language Processing with Python. O’Reilly Media Inc.
Weinrich, H. (2007). Textgrammatik der deutschen Sprache (4th ed.). Hildesheim: Georg Olms.