This page stores some self-made toolbox files written by multiple languages and all the files are classified into detailed different categories.

mxnet toolbox

Recent updated mxnet toolbox provides some examples of ‘ipynb’ format, however, some of them cannot run directly, since the code need data, and the following several files are written in python which help to download relevant dataset and generate .rec files which are required by MXNet RecordIO

mxnet notebook

  • get_cifar100.py
  • generate_cifar100_rec.py

    This two files should be copied to “mxnet-root/examples/notebook” and run before you want to run cifar-100.ipynb. They will help to provide the dataset, but one thing should be noticed that you may need to change the dataset path when you run cifar-100.ipynb notebook example.

mxnet im2rec.py and make_list.py

  • im2rec.py
  • make_list.py

    This two files are original old mxnet tools. make_list.py is used to generate a .lst file which discribes where your images are stored in the form of: image_index \t label \t image_path if you store the image in subdir, you should assign –recursive to True.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    ## if your data are stored like this:
    --- data
    ---MCF7
    ---mcf7_1.jpg
    ---mcf7_2.jpg
    .......
    ---OAC
    ---OST
    ---PBMC
    ---THP1

    ## in your terminal you should type:
    python make_list.py data mydata --train_ratio=0.8 --recursive=True

    ## parameter: mydata is the prefix and you can use any prefix you like...
    ## parameter: data is the root path

    ## likewise after you generate the .lst file, you can use im2rec.py to generate .rec database like:
    python im2rec.py mydata_train data --quality=100

    ## parameter: mydata_train is the name of .lst file without suffix
    ## parameter: data is the root path

mxnet rec database

  • create_rec.py

    This file create mxnet rec database from .npz file. Since the file is hard-coded, you need to change a little bit according your data when using it. The npz files should be named ‘train.npz’, ‘val.npz’ or ‘test.npz’ and in each npz file there are two variables: X_train, y_train for train.npz; X_val, y_val for val.npz and X_test and y_test for test.npz. X_train.shape = (num_train, C, height, width) and y_train.shape = (num_train, )

others

extract images from .mat file

  • extract_image.py

    This python file extract image from .mat files some of it is hard-coded but maybe you can get some tricks from it(this file is used for Q-ATM dataset extraction originally.)

caffe

generate dataset and txtfiles for futher caffe lmdb usage

  • generate_caffe_Data.py

    This file is an example to generate dataset and files for caffe to futher getting lmdb data. Some part of it is hard-coded but it may offer some help for futher usage. You should set the parameters:

    • path: which represents the root of your dataset.
    • phase: represents the type of your images(although kind of weird for naming it phase),
      and you should also set the nubmer of train, test, and validation set: num_train, num_test, num_val.
      You will get three folders – namely train, test, val which store the dataset extracted from all the dataset(already shuffled) and three label txt files – namely, train.txt, test.txt, and val.txt.

create lmdb dataset for caffe usage

  • create_lmdb.py

    This file is an example to generate lmdb dataset from image. Maybe your datasets are not RGB-image, or maybe your data have higher dimension but you can still use this file with small changes.