Custom data Upload
User dataset import to PandaOmics is a two-step process:

  1. Upload a file containing -omics data, e.g., gene or protein expression matrix
  2. Upload a file describing the sample metadata attributes
      Both -omics data and metadata should be formatted as either comma-separated or tab-separated text files (also referred to as CSV/TSV).

      Data upload button is located in the main menu available at the top of any page in PandaOmics.

      You can import a new dataset using the "Data Upload" page. Simply drag and drop to upload your files. Omics data files should be arranged as a matrix with samples presented in columns and genes/proteins presented in rows (see the example below).
      The following -omics types are supported
      • Transcriptomics, both microarray and RNASeq data
      • Proteomics, the absolute quantification of protein levels in samples
      • Methylomics, methylation levels of CpG sites should be summarized
        at the gene level (TSS1500 region is recommended).
        Both M- and B-values are supported
      After the upload of raw omics data and metadata attributes is complete, the following steps are performed to ensure that the dataset will be analyzed smoothly:
      1
      Gene/protein identifiers are converted to HGNC gene symbols. Unrecognized genes will be filtered out of the analysis
      2
      Samples and/or genes mostly containing missing values (NAs)
      will be filtered out of the analysis
      3
      Duplicated genes/proteins will be eliminated
      4
      The data will be log-transformed and normalized as necessary
      5
      When dealing with Methylome datasets, Beta-values will be automatically converted to M-values, as the latter are more suitable for differential analysis
      The following gene identifiers are allowed
      • Gene Symbol ID
      • RefSeq ID
      • Ensembl Gene ID
      • UCSC Gene ID
      • UniProt ID
      • Entrez Gene ID
      Data value formats are automatically recognized by the system and treated accordingly:

      The Positive Integer
      — a number that indicates gene counts

      The Positive Decimal — a normalized gene expression signal

      The Positive and Negative Decimal — normalized, log-transformed gene expression values. Please note that proteomics data should not be log-transformed.
      Example data matrix
      File formatting options are recognized by the system automatically
      Manual selection of those parameters is available in advanced settings.
      • 1
        File encoding
      • 2
        Value delimiter
        including a comma, a tab, and a space, etc.
      • 3
        Missing value indicator
        The following symbols are handled as missing values
        '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
      • 4
        Decimal separator
        a dot or a comma
      • 5
        Handling missing values
        Detected missing values can be imputed with "Zero", "Geometric mean", or "Arithmetic mean" depending on data distribution
      To complete data import, you should enter the name of the dataset and specify the omics technology (Microarray, RNA-Seq, Proteomics, Methylation), then click the upload button.
      The upload process may take some time, depending on the file size. Meanwhile, you can proceed using PandaOmics and get back to data upload later. Only one dataset can be uploaded at a time.

      Once a dataset is uploaded, clicking on the 'Confirm' button will redirect you to the 'Dataset' page, where you can find the general information about the dataset that includes the total number of samples and genes/proteins. Here you also can click on the 'Add Metadata' button. This will open a 'Sample Metadata' upload page where you can drag and drop a file containing sample metadata.

      Metadata annotation can be used for the further setup of the sample groups. Below is an example of a metadata table. The first column must always contain sample identifiers matching with the sample IDs of the original dataset matrix.
      A metadata file can be associated with a dataset. Metadata annotation can be used for further setup of the sample groups. Below is an example of a metadata table
      The first column should always contain sample identifiers that match with the sample IDs of the original dataset matrix
      Once the metadata file is uploaded, you can browse 'Metadata Statistics' that includes:
      1. The total number of detected sample attributes
      2. The number of samples from the -omics data file matching sample names from the metadata file
      3. The list of metadata attributes, including attribute type.

      Clicking on the Confirm button will redirect you to the Dataset entity page where you can start creating experimental sample groups and run case/control comparisons.

      Once the dataset is uploaded, it can be found in the Data Manager.
      Training Video