Computing Staff
  • 3

Delete Duplicate Rows In Each CSV And Some Rows

  • 3

I have a folder with 100s of CSV files. What I want is a batch file to process each file by looping 1 by 1 and delete any duplicate rows under it.

Other than The duplicate lines there are some rows which I want to delete based on 2nd columns value.

Name,Test Name
City,Mumbai
Country,IN

I want to delete rows where second column of the row matches IN

Is it possible to do so with a windows batch script?

Share

2 Answers

  1. :: =====  script starts here  ===============
    :: get out dups & IN in column 2
    :: rubal.bat  2013-02-27 17:27:30.37
    @echo off > newfile & setLocal enableDELAYedeXpansioN
    
    for /f "tokens=* delims= " %%x in ('dir/b *.csv') do (
    type nul > newfile
      for /f "tokens=* delims= " %%o in (%%x) do (
      find "%%o" < newfile > nul || >>  newfile echo.%%o
      )
    copy newfile %%x > nul
    )
    del newfile
    
    for /f "tokens=* delims= " %%c in ('dir/b *.csv') do (
    if exist # del #
      for /f "tokens=* delims= " %%a in (%%c) do (
      set S=%%a & set Z=!S:,= !
      call :sub1 !Z!
      if defined S >> # echo.!S!
      )
    copy # %%c
    )
    del #
    goto :eof
    
    :sub1
      if '%2' equ 'IN' set S=
    goto :eof
    ::======  script ends here  =================
    
    
    • 0
  2. The following untested batch script does the job assuming
    1) the duplicate rows to be deleted are in sequence;
    2) the key field for deletion starts as ,IN

    @echo off & setlocal EnableDelayedExpansion
    pushd Your_Folder
    for %%i in (*.csv) do (
    set row=
    for /F “delims=” %%j in (‘type “%%i” ^| find /V “,IN”‘) do (
    if not “%%j”==”!row!” echo.%%j>> “%%~ni.tmp”
    set row=%%j
    )
    )
    del *.csv
    ren *.tmp *.csv
    popd

    • 0