Zur Kurzanzeige

dc.date.accessioned2021-09-24T14:30:14Z
dc.date.available2021-09-24T14:30:14Z
dc.identifier.urihttps://fif.hebis.de/xmlui/handle/123456789/1902
dc.description.abstractOverall, we have collected 203,886 online articles that were published on three platforms between January 23, 2020 and June 22, 2020. Reuters.com and nytimes.com are the websites of the respective international news companies owned by Thomson Reuters and New York Times Company. The covered topics include business, politics, financial markets, science or health. In addition, we have also collected data from MarketWatch, which purely focuses on financial news and stock market data. The MarketWatch articles contain the most words on average (706) and the lowest maximum count (3857). The data collection process consists of three steps. First, we gather the URLs of the online articles either through the API or web crawling. The Reuters and MarketWatch crawler are developed using a link extractor written in Python Scrapy. The main goal of web scraping is to extract structured data from unstructured web pages. Scrapy contains the Spider class which can be used to define how to crawl and parse pages to extract items from a particular site (e.g., specifying the links). In addition, the Item class supports the creation of a container to collect the scraped data. The API and the crawler allow us to store the meta data in the database, such as headline, author, publish date and URL. Afterwards, we filter the COVID-19 URLs by focusing on the related keywords, such as ‘COVID” and “Corona”. In the last step, we collect all text elements (p-tags) from the remaining URLs, i.e. date, title, author and text. Figure 1 depicts the weekly number of collected articles on NYTimes, Reuters and MarketWatch during the course of the pandemic.
dc.rightsAttribution-ShareAlike 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/
dc.subjectFinancial Markets
dc.titleSurvey_CNHP_2020
dc.typeResearch Data
dcterms.isReferencedByhttps://fif.hebis.de/xmlui/handle/123456789/2394?Machine Learning Sentiment Analysis, Covid-19 News and Stock Market Reactions
dc.subject.keywordscovid-19 news
dc.subject.keywordssentiment analysis
dc.subject.keywordsstock markets
dc.subject.jelG10
dc.subject.jelG14
dc.subject.jelG15
dc.identifier.urlhttp://dx.doi.org/10.2139/ssrn.3690922


Dateien zu dieser Ressource

DateienGrößeFormatAnzeige

Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Zur Kurzanzeige

Attribution-ShareAlike 4.0 International
Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Attribution-ShareAlike 4.0 International