Extract data from a spreadsheet file on the linux command line

Sometimes we need to extract data from an Excel spreadsheet for analysis. Here is one approach using the ssconvert tool.

If this isnt installed on your linux machine then you most likely can get it from the package repository.

$ sudo apt install ssconvert

Then if you want to extract a spreadsheet file into a tsv it can be done like this:

$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' SomeData.xlsx SomeData.xlsx.tsv


You will notice that all the sheets are output to separate tsv files. This approach is nice as it can accommodate high throughput screening, as I implemented in my Gene Name Errors paper a while back.

Here is an example of obtaining some data from GEO.

$ #first download
$ curl 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE80251&format=file&file=GSE80251%5Fprocessed%5FRNA%5Fexpression%5Fmnfyap%2Exlsx' > GSE80251.xlsx


$ #now extract
$ ssconvert -S --export-type Gnumeric_stf:stf_assistant -O 'separator="'$'\t''"' GSE80251.xlsx GSE80251.xlsx.txt

$ head -5 GSE80251.xlsx.txt.0 | cut -f-4
gene Untreated.AVE Erythromycin.AVE Clindamycin.AVE
aaaD 464.2738789413636 216.166053873777 908.8802442142787
aaaE 864.2227500897734 561.77470127908 662.8782261191445
aaeA 4380.496862018132 2618.7642171454263 5816.7951770285545
aaeB 7846.3239697416175 7792.316476955105 10312.333492435688


Further reading: ReadXL is an R package designed to import Excel data into R
https://readxl.tidyverse.org/

Comments

  1. Great Article! This blog is useful for everyone. Turnao is a web-based application used to Convert excel spreadsheet to online database.

    ReplyDelete
  2. Replies
    1. Probably yes - with the command
      apt install ssconvert
      Thanks for introducing me to Termux. It looks like an interesting tool for turning the mobile handset into a fully functional development device

      Delete
  3. Nice blog! Blog is very useful for everyone who want to store excel data in online excel database. Trunao provide online tool to convert your for excel sheet in online database.

    ReplyDelete

Post a Comment

Popular posts from this blog

EdgeR or DESeq2? Comparing the performance of differential expression tools

Data analysis step 8: Pathway analysis with GSEA

Installing R-4.0 on Ubuntu 18.04 painlessly