This blog post is intended to provide some general advice and code-writing tips and it is not a complete guide to Matlab programming. There are many Matlab introduction tutorials online, and several books from which to learn Matlab. However, it is a good idea even for readers with modest Matlab experience to read this post.
Write clean and efficient code
It is true that cognitive electrophysiology studies are evaluated based on the quality of the theories, hypothesis, experiment design, analyses, and interpretations; but not for the aesthetics of the programming code used to analyze the data. However, there are some reasons to try to write clean and efficient code. Keep reading.
Clean code is easy to read and understand
Having easy-to-read code will help you prevent making programming errors, and when there are programming errors, it will help you identify and remedy those errors faster. Moreover, if your code is clean and commented, you will be able to return to the code after months of not looking at it and still understand what the code does and how to work with it. Also, this is more even important if you want to share your code with other people and if you want other people to be able to understand and adapt your code.
There are three strategies to keep in mind that will help you write clean code:
- Use comments. Write comments before a line of code or a collection of lines of code to indicate succinctly what those lines do. Keep comments brief and to the point, and do not use comments excessively.
- Group lines of code by their common purpose.
- Use sensible and interpretable variables names (more on the next section).
Efficient code runs faster
If your code runs faster, you will spend less time waiting to finish and more time looking at results and working on new analyses. To write efficient code you could approach it in many ways:
- Avoid redundancies. Performing the same computations multiple times might slow your code. For example, avoid evaluating an equation inside a loop if it could be evaluated outside the loop.
- Avoid lines of code that are never called or that do nothing.
- Avoid separating into multiple files what would better be done in one file, or vice-versa.
- Whenever possible, perform matrix manipulations instead of loops. Loops should be used only when necessary.
- Using informative comments will help you figure out ways to make the code more efficient.
Clean and efficient code promotes clear and organized thinking
Programming is problem-solving. You must first conceptualize the problem, then break it down into subproblems, and then break those sub-problems further down into individual digestible computer commands. Programming (and science, in general) involves taking an abstract idea for an analysis or figure you would like to show and turning that idea into a series of logical and concrete statements, statistical analyses, and theory-relevant interpretations.
Most of the hard work in programming should be done in your head and on a piece of paper first. Particularly if you are new to programming, start writing your script on a piece of paper with a pencil. Write down what the script should do and in what order. After you have a plan for how to write the script, then turn to your computer, open Matlab, and start programming.
After you consider the advice in the previous paragraphs, take this one additional piece of advice: do not obsess over having the cleanest and most efficient code possible. It is not the most important part of being a cognitive electrophysiologist, and you can certainly improve over and over your code over time. Moreover, as you write your code, whether you are programming all of your own analyses from scratch or writing a script to call EEGLAB commands, try to keep your code efficient and clear. Your future self will thank you.
Use meaningful file and variable names
Give files useful names such as you can guess from the name what will you find if you open the script. For example name a file “FingerTapping_30bpm_analyses.m”. Note that some versions of Matlab on some operating systems will give errors when calling files that have spaces in the file names; you can use underscores instead. Additionally, write commented notes at the top of each script that explain what the purpose of that script is.
It is even more important to give meaningful names to variables. Variables should have names that will allow you to identify and disambiguate the purpose of those variables from other variables. For example, if you have EEG data to save in a variable, it is better to call the variable “EEG_Data”.
In Matlab, note that variable names have specific name requirements:
- cannot start with numbers but may contain them (e.g. Trial_1),
- cannot have many nonalphanumeric characters (e.g., $, &, %, #),
- and cannot have spaces (underlines can be used instead).
Make regular backups of your code and keep original copies of modified code
Spending hours on an analysis script only to accidentally close the script without saving it or overwrite the file is not an enjoyable experience. Fortunately, Matlab automatically creates backup files (.asv or .m~ on Windows/Mac), which should help minimize accidental script loss, although these backups might be tens of minutes behind the original version. Also, try to save the script as many times as possible. It won´t hurt anyone and you will keep your work safe.
Avoid working on multiple copies of the same analyses script. For example, if you keep a spectrogram script on the server computer, and then you back up that script on your portable USB drive, make sure you are modifying only one of those scripts. Otherwise, you might end up making different changes to different versions of the script.
Furthermore, if you modify functions that come with Matlab or a third-party toolbox such as EEGLAB or BCILAB, it would not be a bad idea to save the original function with a different name. For example, you can save the file as “filename_orig.m” and then modify “filename.m”. The other option is to modify the original file and comment each line you modify or add. This latter option is suboptimal when you make many modifications because keeping track of every change you make may get clumsy.
Initializing variables means that you reserve space in the Matlab buffer for that variable by creating the variable before populating it with data. Typically, the variable is set to contain all zeros, or all ones, or all NaNs (not-a-number). You do not need to initialize all variables, particularly smaller variables or variables that you use for only a short period of time. However, larger or more important variables, such as those that contain data you intend to analyze or save, should be initialized before use. In Matlab, unlike in some other programming languages, it is permitted to add elements or dimensions to variables without initializing them first, but this behavior should be avoided when possible. There are three reasons why you should initialize variables.
- Initializing variables, particularly for large matrices, helps avoid memory crashes.
- Initializing variables that will be populated inside a loop helps prevent data from previous iterations of the loop contaminating current iterations. In some cases, you may not know how big a variable will be and thus cannot initialize it to the final size. In this case, it might be better to initialize the matrix to NaNs instead of zeros. Another potential mistake that may be difficult to find because it will not produce a Matlab error or warning is the location of the initialization.
- Initializing variables will help you to think in advance about the sizes, dimensions, and contents of large and important variables. As noted earlier, the more thinking you do before and during programming, the cleaner, more efficient, and less error-prone your scripts are likely to be.
Even the most experienced and savvy programmers get stuck sometimes. There will certainly come a time when you need help, at least with Matlab programming. Matlab programming issues generally fall into one of three categories.
You know the function name but don’t understand how it works
Start by typing “help <function name>” in the Matlab command. In some cases, you can type “doc ” to get a more detailed help file. Many functions have help files that contain examples; try running the example lines of code. Most functions are simply text files; you can open the function with “edit ” and look through the code to try to understand how it works. This option is more useful when you develop some experience with programming and reading other people’s code. Not all functions are viewable; some functions are compiled for speed. Try running the code with simpler inputs for which you can better understand the output. For example, if you have a large four-dimensional matrix and are unsure which dimension is being averaged in the mean function, try creating smaller matrices of only a few numbers, for which you can easily compute the means, and compare these against the function outputs. You can also plot the data before and after calling the function to see what effect the function had on the data. You can search the Internet to see if there are additional discussions of that function or additional examples of how to use the function. You can also ask colleagues for help. However, try to figure it out on your own before asking someone else. It may initially seem like a waste of time to spend 30 min understanding a function when a colleague could explain it to you in 30 s, but if you figure it out on your own, you are likely to learn more from that experience, and therefore, you are likely to avoid making that kind of error in the future. This is how you become a better programmer.
You know what you want Matlab to do, but you can’t figure out the command or function for it
This is a frustrating problem. The three ways to solve this issue are by reading the help file of similar functions (in particular, look for the “see also” part at the end of the help file), searching on the Internet for what you want Matlab to do, and asking a colleague.
You know what you want Matlab to do and you know the command to do it, but there are errors
When you run the code. Newcomers to Matlab seem to spend most of their time and frustration on this kind of Matlab issue. If your Matlab command window contains more red than black, do not give up hope; errors become less frequent over time as you learn from mistakes. Here are some tips for resolving this kind of Matlab error.
Find the function or command that produces the error.
This may sound trivial but can be tricky if there are multiple functions called in one line. For example, if you get an error with the following line of code:
You need to determine whether the error resulted from one of the three functions called (abs, mean, exp) or from one of the two variables (data, trials). To locate the error, start from the innermost (or most deeply embedded; most likely in the middle of the most parentheses or brackets) variable or function, and evaluate this in the Matlab command. In this case, start by evaluating trials and see if that produces an error. If not, move to the next function or variable – in this example, data(:,trials). Eventually, you will find where the error occurs.
Read the error message (the red text)
Sometimes, error messages seem to be written in a foreign language. If you do not understand the error, look for keywords. For example, if the error message contains “matrix size” or “matrix dimensions”, use the size function to examine the dimensions of the variable that produced the error. If the error message contains “subscript indices must be real positive integers” or “index exceeds matrix dimensions”, then probably your index variable (in the above example, trials) has zeros, negative numbers, fractions, or numbers greater than the size of the matrix being indexed. Some error messages are more self-explanatory, such as “incorrect number of input arguments”.
Plot the inputs and outputs of the functions that precede the error
If you still cannot solve the error, try plotting all of the inputs and outputs of the functions that precede the error; perhaps you will notice something strange or obvious in the plots. Missing data points in plots are likely to be NaNs or Infs (infinity) – these can cause errors in some functions or when used as indexing variables. You can also use the step-in option, which will halt the function that produced the error at the offending line. This is beyond the scope of this chapter, but you can read more about stepping in on the Internet or in Matlab tutorials.
Others possible causes
Other possible causes of an error include the following:
- The function is not in Matlab’s path.
- The function is contained in a toolbox that you do not have.
- The function is compiled for or relies on libraries that are specific to a version of Matlab (e.g., 32-bit vs. 64-bit) or operating system.
- If you use a variable name that is the same as a function name. For example, if you write “mean=my_data_part;” Matlab will recognize mean as a variable instead of the function. Using variable names that are existing functions is bad practice. If you want to know whether a name is already used by a function or existing variable, type which .
Be patient and embrace the learning experience
Debugging Matlab code can be an infuriating and humiliating experience that makes you want to quit science and sell flowers on the street. But don’t give up hope – it gets better. Embrace your mistakes and learn from them. Remember: no one is born a programmer. The difference between a good programmer and a bad programmer is that a good programmer spends years learning from his or her mistakes, and a bad programmer thinks that good programmers never make mistakes. I get annoyed when people think I am a good programmer because I can find and fix their bug in 30 s when they could not do it in 2 h. What they do not know is that I spent much more than 2 h finding and fixing that exact same bug in my own code, probably several times in the past. Eventually, I learned to recognize what that bug is, where in the code it is likely to be found, and how to fix it. Remember – time spent locating and fixing programming errors is not time lost; it is time invested.
Clean and efficient will help make your live and data analyses easier.
You can find more and further information in:
Analyzing Neural Time Series Data: Theory and Practice by Mike X Cohen – link