Extract data from a spreadsheet file on the linux command line

Sometimes we need to extract data from an Excel spreadsheet for analysis. Here is one approach using the ssconvert tool.

If this isnt installed on your linux machine then you most likely can get it from the package repository.

$ sudo apt install ssconvert

Then if you want to extract a spreadsheet file into a tsv it can be done like this:

$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' SomeData.xlsx SomeData.xlsx.tsv

You will notice that all the sheets are output to separate tsv files. This approach is nice as it can accommodate high throughput screening, as I implemented in my Gene Name Errors paper a while back.

Here is an example of obtaining some data from GEO.

$ #first download
$ curl 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE80251&format=file&file=GSE80251%5Fprocessed%5FRNA%5Fexpression%5Fmnfyap%2Exlsx' > GSE80251.xlsx

$ #now extract
$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' GSE80251.xlsx GSE80251.xlsx.txt

$ head -5 GSE80251.xlsx.txt.0 | cut -f-4
gene Untreated.AVE Erythromycin.AVE Clindamycin.AVE
aaaD 464.2738789413636 216.166053873777 908.8802442142787
aaaE 864.2227500897734 561.77470127908 662.8782261191445
aaeA 4380.496862018132 2618.7642171454263 5816.7951770285545
aaeB 7846.3239697416175 7792.316476955105 10312.333492435688

Further reading: ReadXL is an R package designed to import Excel data into R


  1. Great Article! This blog is useful for everyone. Turnao is a web-based application used to Convert excel spreadsheet to online database.

  2. Replies
    1. Probably yes - with the command
      apt install ssconvert
      Thanks for introducing me to Termux. It looks like an interesting tool for turning the mobile handset into a fully functional development device

  3. Nice blog! Blog is very useful for everyone who want to store excel data in online excel database. Trunao provide online tool to convert your for excel sheet in online database.

    1. Hey Alfred just gone through with your above link, Its impressive
      Everyone should try this once. It converts 10k Excel rows very quickly
      please check its features

  4. Thank you, very usefull tool.
    Just the installation on Debian can be only with a command:
    apr install gnumeric


Post a Comment

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Installing R-4.0 on Ubuntu 18.04 painlessly

EdgeR or DESeq2? Comparing the performance of differential expression tools