Recording Music On The Linux Terminal

If you've always wondered how to record music on the Linux terminal, you've come to the right place -- lucky you! I always wondered how to do it myself, and after tinkering I've found several applications that get the job done. They are listed below.

Jack: Jack: The Jack Audio Connection Kit is essential if you want low latency and the ability to record applications directly. It could be pulled off with ALSA, but you would be required to set up a loopback device, and if you're recording a track and listening to a track at once the loopback would pick up everything. So yeah, Jack is the bomb. I prefer Jack1 (there are two versions of Jack which are backwards-compatible), so the methods used in this guide should work for Jack2 as well.
Ecasound: Ecasound is an amazing piece of software that's been developed for almost two decades. It's a multitrack recorder/player, effects processor and a whole bunch of other stuff. Suffice to say that it's really cool and it's provided me with everything I need. If you discover something superior let me know, but it's the best solution I know of.
Sox: Socks was the name of my cat when I was a kid, so whenever I use Sox I think of my cat. Anyhow I digress: Sox is the "Swiss army knife" of audio manipulation. It can play/record audio, mix audio files together, trim, add effects, lots of cool stuff. I use it for reverb, in/out fading, trimming, equalizing, normalizing, compressing, etc etc.
Midish: Midish is a Midi sequencer for the terminal that allows you to edit/record/play Midi, quantize, all this crazy and neat stuff. It can come in handy if you want to operate in pure Midi, which happens if you're using things like Fluidsynth, Yoshimi or LinuxSampler to provide instrumentation. The quantize feature can make it easier to get a solid backbeat down for recording vocals, guitars or things that aren't Midi.
Fluidsynth: Fluidsynth is a sample-based synthesizer that has a command-line interface and a daemon mode. It uses soundfonts primarily, which is an older format for storing sampled instruments. Although Soundfont is an antiquated format there are still plenty of good samples and GM Midi banks such as the Crisis soundfont.
LinuxSampler: LinuxSampler is arguably the best software sampler available today. In fact, it was one of the first EVER software samplers available; it's been developed for a long time, like Ecasound. It supports Gigasampler, sfz and soundfont, although Fluidsynth is probably a better option for Soundfonts because LinuxSampler primarily focuses on the other two formats. Sfz is a modern sampling format that is non-monolythic, that is, it stores all the samples in separate files instead of just one huge file. The format is also textual in nature, thus there is no special editor required to edit the main file which specifies how the instrument should behave. There are many incredibly great-sounding instruments floating around on the internet in the sfz format, including the Salamander grand piano and drumkit.
Yoshimi: What recording studio would be complete without a synthesizer? Yoshimi is just that: a software synthesizer that uses additive, subtractive and PAD algorithms to generate sounds. It stands out because not only are the sounds it can generate limitless, it is fully controllable from the command line. The developers were kind enough to consider accessibility, thus there are methods for using any control in the graphical interface from the command line. Yoshimi should accommodate all of your synthesizing needs. Now I feel like we're in chemistry class.

Individual Pieces

Now I'm going to describe each piece in detail and give instructions on basic usage/setup.

Jack

As I mentioned earlier, Jack is an acronym that stands for Jack Audio Connection Kit. It is a low latency sound server that allows you to record any application directly. It is also possible to make connections from one application to another. For example, one can connect Yoshimi's "yoshimi:midi in" port to "system:midi_out_2", which will allow you to play the software synth with a keyboard. Alsa also has a Midi system which I will discuss later.

If you use Speakup/Orca (as I do), you will encounter the problem of speech not working when Jack starts. I have tried a variety of work-arounds to this problem (using libao with the jack plugin and using speechd-up) but there is a lot of latency, and libao stopped working for me after a while. I tried the alsa-jack plugin but this didn't work at all, and it has the problem of the jack client disappearing whenever audio stops playing. Of course for an application like espeakup or speech dispatcher the speech synthesis is not continuous, so this solution will not work. The third solution I tried works rather well, and this is the one I will share here. It requires only two steps:

Modify asound.conf to create the loopback device
Write a shell script to connect the loopback device to system:playback_1 and system:playback_2 so you can actually hear the ALSA apps

As a side note, once you modify asound.conf, ALSA apps won't work unless Jack is started and the loopback device is connected to Jack's system:playback ports. I simply move asound.conf to asound.conf.bak and everything works fine. You could change the default ALSA device to the default soundcard instead and manually switch ALSA when Jack starts, but I find this more inconvenient than simply moving asound.conf.

Copy the asound.conf to /etc and the shell script to /usr/bin; use chmod +x to make the shell script executable. Kill all ALSA apps, start jack and then run the loop_jack script before restarting espeakup or Orca. A small caveat is that Orca and eSpeakup will not run at the same time. I have a feeling this has to do with eSpeakup taking control of ALSA, as no other ALSA apps will work while eSpeakup is running; furthermore eSpeakup must be started with root permissions so this may also have something to do with it. For me this isn't that big of a deal (most people use Pulseaudio and there is the same issue), so when you wish to use Jack you must choose between using Orca or using Speakup. Since we're dealing with the terminal Speakup might be the better option, but this is all user preference.

For my convenience I wrote a shell script to start Jack. It takes one argument -- the preferred sampling rate. A detailed list of all the options Jack accepts are described in the manual, but I will go over the ones I used in the shell script:

The -d switch specifies which driver Jack will use for audio. ALSA is the option in Linux but I think OSS could also be an option for the folks using BSD
The -r switch specifies the sampling rate
The --periods switch controls how many periods occur per buffer. A larger period size reduces CPU usage but increases latency. On my machine a latency of 2.9MS is achieved with a sampling rate of 48000hz and a period size of 128. This is overkill; anything below 5ms will probably not be a noticeable difference, although latencies above 5ms may start to become perceptable. 5ms plus or minus 2.5ms is probably a good range.

Another note: Jack by default uses DBus to initialize audio. This can be inconvenient for many reasons. I could never get Jack to start in the terminal with DBus, so you may need to set the environment variable JACK_NO_AUDIO_RESERVATION=1 before Jack will start in the terminal. I simply added a shell script to /etc/profile.d that exports this each time the system starts. You could also add the line "export JACK_NO_AUDIO_RESERVATION=1" to the start_jack.sh shell script. Either way should work.

Just run "start_jack 48000 &" (the & at the end will run the script in the background) and you should be good to go!