- Be able to check for the existence of files on disk.
- Understand how to represent data to facilitate saving to disk.
- Be able to save data to a file on disk.
The typical purpose of running an experiment is to generate some data that we can use to address our question of interest. Hence, it is important that we are able to store the data generated by our experiments to a file on disk.
Creating and checking a file location
The first step in saving data from an experiment is knowing where to save the data to.
A common scenario is to save the data from each session (subject and repeat, perhaps condition) to a separate file.
This approach maximises the flexibility for subsequent analyses.
We may end up with a file name such as
A useful step at this point is to check that a file with this name doesn’t already exist—it is very frustrating to lose data because it was overwritten!
This is also best done right at the start of an experiment, before any data is collected, so that the impact of a wrong filename is minimised.
We can check whether the file already exists by using
import os data_path = "p1000_exp_cond_1_rep_1.tsv" data_path_exists = os.path.exists(data_path) print data_path_exists
If the file already exists, a safe strategy would be to exit the program at this point and tell the user about the problem.
One way to do this is to use the
import os import sys data_path = "p1000_exp_cond_1_rep_1.tsv" data_path_exists = os.path.exists(data_path) # we will pretend that it does exist data_path_exists = True if data_path_exists: sys.exit("Filename " + data_path + " already exists!")
Filename p1000_exp_cond_1_rep_1.tsv already exists!
Organising data output
While the data you collect in an experiment may come in many forms, it is best to work out a way in which the data can be represented in a “tabular” form; that is, consisting of rows and columns. This is the easiest format with which to store data on disk.
For example, say your experiment consists of 10 trials where each trial shows a grating at a particular orientation and records whether the participant pressed the left or right arrow key in response. This could be organised in the data file as 10 rows and 2 columns; each row represents a trial, and the two columns give the grating orientation and the response on that particular trial. For example:
import random import pprint data =  for trial in range(10): data.append( [ random.uniform(0, 180), random.choice(["left", "right"]) ] ) pprint.pprint(data)
[[170.2380134766284, 'left'], [21.43877741975255, 'left'], [44.29799618149051, 'right'], [123.78765580362592, 'right'], [90.4004819386519, 'left'], [65.2586502574488, 'left'], [67.08706605950744, 'left'], [18.31637412188478, 'right'], [36.415401853733, 'left'], [89.6258850472638, 'left']]
Note that we are using a “pretty printer” (
import pprint) to show the contents of
data, as it makes the tabular organisation clearer.
However, to make it easier to save and load data to disk, we often convert strings into numbers by coding them. For example, rather than having “left” and “right”, we might use the numbers 1 and 2 to refer to the keys that were pressed.
import random import pprint data =  for trial in range(10): data.append( [ random.uniform(0, 180), random.choice(["left", "right"]) ] ) pprint.pprint(data) coded_data =  for data_row in data: if data_row == "left": data_row = 1 elif data_row == "right": data_row = 2 coded_data.append(data_row) pprint.pprint(coded_data)
[[15.841285395394317, 'left'], [77.24308806155922, 'right'], [120.31863826174141, 'left'], [68.20413796190277, 'right'], [140.60816319586255, 'left'], [29.668900179043224, 'left'], [90.33576145547305, 'left'], [162.39870297804484, 'left'], [22.306018494281254, 'left'], [53.76279809035283, 'left']] [[15.841285395394317, 1], [77.24308806155922, 2], [120.31863826174141, 1], [68.20413796190277, 2], [140.60816319586255, 1], [29.668900179043224, 1], [90.33576145547305, 1], [162.39870297804484, 1], [22.306018494281254, 1], [53.76279809035283, 1]]
Once we have our data assembled in a suitable format (i.e. list of lists, as above), we can save it to disk using the
The first argument is the filename, and the second is the variable containing the data to be saved.
We will also specify an optional argument called
delimiter, which we will set as
This tells the
savetxt function that we want columns to be separated by a TAB character.
This is a common way of storing files, and is reflected in the data’s filename (with “tsv” standing for “tab separated values”).
import random import numpy as np import pprint data =  for trial in range(10): data.append( [ random.uniform(0, 180), random.choice([1, 2]) ] ) pprint.pprint(data) np.savetxt( "p1000_exp_cond_1_run_2.tsv", data, delimiter="\t" )
[[111.72308946245312, 2], [0.23890846120006692, 2], [175.28011985115702, 1], [112.06603323103002, 1], [86.06792584342334, 2], [44.22366893318996, 1], [101.4029141498375, 1], [115.703634373552, 1], [100.9374884735026, 1], [8.21493430720146, 2]]
We can then go and load the data using
loadtxt to verify that we are indeed able to do so:
import numpy as np import pprint data = np.loadtxt( "p1000_exp_cond_1_run_2.tsv", delimiter="\t" ) pprint.pprint(data.tolist())
[[111.72308946245312, 2.0], [0.23890846120006692, 2.0], [175.28011985115702, 1.0], [112.06603323103002, 1.0], [86.06792584342334, 2.0], [44.22366893318996, 1.0], [101.4029141498375, 1.0], [115.703634373552, 1.0], [100.9374884735026, 1.0], [8.21493430720146, 2.0]]