***********************************************************************************
*	    Overview of data steps in Bilicka, Casi, Seregni and Stage		  
*	          Tax Strategy Disclosure: A Greenwashing Mandate?
*			    Last updated: 12 March 2025			          
***********************************************************************************

The Excel file "Identifiers" contains the ISINs of the firms that are 
in our final disclosure and ETR Samples.

We are providing four sets of code instrumental to create the datasets and run the 
analyses in the paper.

1. "ALL_RCode" contains the R code to conduct the following steps from the raw data 
and construct the relevant input for our analyses (corresponding log file: "ALL_RCode_log")

2. "NaiveBayes_analysis" contains the Phyton code to identify the tax strategy 
sentences using the NB classifier (corresponding log file:"NaiveBayes_analysis_log")

3. "All_statacode_mainsamples" contains the Stata code to create all the relevant variables 
and conduct the empirical analyses in the main sample plus appendices except for table F2 
(see below) (corresponding log file: "All_statacode_mainsamples_log") 

4. "Orbis data cleaning and table F2" contains the raw data to results table code for
 appendix table F2 (corresponding log files: table F2 panel A 
"OrbisunconsolidatedreplicationA" and for panel B "OrbisunconsolidatedreplicationB")